Back to Maus

Why pasting screenshots into Claude eats your tokens (and how OCR fixes it)

May 26, 2026 · Manuel Toledo
Quick answer

Anthropic prices Claude images at (width × height) / 750 tokens. A 1080p screenshot of a stack trace costs around 1,568 tokens on Sonnet, up to 4,784 on Opus. The same text typed out is 150–300 tokens. For text-heavy screenshots — terminal output, errors, log dumps, PDF excerpts — feeding Claude the OCR'd text instead is 5–10× cheaper and the answer is usually better. Maus OCR's every copied screenshot automatically and locally, so the text version is already in your clipboard before you need it.

How Claude charges for images

Anthropic publishes the formula in the vision docs:

Anything larger gets downscaled before counting, but it still hits the cap. The math is mechanical — you can run the numbers yourself for any screenshot.

ScreenshotPixelsSonnet tokensOpus tokens
Small UI snippet200 × 200~54~54
Single-monitor window800 × 600~640~640
Full 1080p screen1920 × 10801,568 (capped)~2,765
Retina screen (4K)3840 × 21601,568 (capped)4,784 (capped)

What the same content costs as text

Take a typical stack trace screenshot — a 1080p window with 15 lines of output. Pasted as an image: ~1,568 tokens on Sonnet. Typed as plain text: ~200 tokens. Same information, ~8× cheaper.

The ratio holds for almost everything devs screenshot:

For all of these, the image is doing zero extra work for Claude. The model has to OCR the screenshot internally anyway — you're paying for pixels that resolve back to the same tokens it would have read directly from text.

When this actually matters

For a one-off question, the token cost is negligible. The case where it bites:

When the image is still the right call

OCR'd text isn't a strict replacement. Keep the image when:

For everything else where the screenshot is text wearing a pixel costume, the text version wins.

How automatic OCR fits

The friction has always been: by the time you've decided "I should send the text instead of the image", you've already taken the screenshot, and retyping a 15-line traceback is annoying. macOS Live Text handles this if you take the screenshot to file and open it in Preview — but most devs don't, they hit ⌘⇧⌃4 straight to clipboard.

Maus does this automatically. Any time you copy a screenshot (or any image with text), Maus runs OCR using Apple's Vision framework — locally, no upload — and adds the recognized text as a separate clipboard item right below the image. Two clips, both available. Next paste, you choose:

No setup. No "convert this screenshot" step. The text version is just there, in your history, searchable.

Three concrete workflows

1. Pasting a terminal error into Claude Code

You see a stack trace in Warp. Old workflow: take a screenshot, paste image into Claude Code. ~1,500 tokens. New workflow: ⌘⇧⌃4 to capture, paste the OCR'd text instead. ~150 tokens. Same answer.

Even simpler: don't screenshot at all. Just select the terminal text and copy. But if you've already screenshotted (faster for partial selections on a busy terminal), the OCR'd text is the cheap path.

2. Feeding Claude an excerpt from a PDF

PDFs in Preview support text selection — but tables, scanned PDFs, and figures don't. Screenshot the section, Maus OCRs it, paste the text into Claude. Works for anything you can see on screen, including image-only PDFs.

3. Capturing a code snippet from a video

Conference talk, screencast, tutorial. Pause, screenshot the code shown on screen, OCR'd text lands in your clipboard. Paste into Cursor or Claude to ask "explain this" or "port this to Rust". No transcription.

Privacy and accuracy

Apple Vision (the OCR engine in Maus and Live Text) runs entirely on-device. Nothing about the screenshot leaves your Mac. For internal logs, error messages with paths, customer data — this matters. Cloud OCR (Google Vision, AWS Textract) uploads the image; for sensitive content, that's a no.

Accuracy is high for clean rendered text (terminal output, IDE code, web pages). It's lower for handwriting, decorative fonts, or photos of screens with reflections. For 99% of dev screenshots, it's accurate enough that pasting the text into Claude gives the same answer as pasting the image.

FAQ

How many tokens does a screenshot cost in Claude?

Roughly (width × height) / 750. Capped at 1,568 on Sonnet 4.6 and earlier, up to 4,784 on Opus 4.7/4.8. A 1080p screenshot is near the Sonnet cap. The same text is usually 150–300 tokens.

Does OCR change Claude's answer quality?

For text-heavy screenshots — terminal output, stack traces, code — no. The image and the text resolve to the same content. Keep the image when layout, hierarchy, or geometry is the question.

Is screenshot OCR private?

Depends on the tool. macOS Live Text and Maus use Apple Vision locally — the image never leaves your Mac. Cloud OCR services upload it. Use local OCR for anything sensitive.

Why not just retype the text from the screenshot?

You can. But 15 lines of traceback is 20+ seconds and a typo risk. Automatic OCR makes the text version available the same moment the screenshot lands.

What about Claude Code's image attachment?

Same cost model. Image tokens scale with pixels; text tokens scale with characters. For long sessions where context budget matters, feeding text instead of images is what keeps you under it.

Stop paying for pixels when text is what Claude needs

Maus runs OCR on every copied screenshot, locally, using Apple Vision. The text version sits in your clipboard ready to paste. Free with 24h history. Pro $12.99 once for unlimited.

Download Maus for Mac More on OCR on Mac