AI Cost Estimator

Estimate your AI coding costs

← Back to Blog

Claude Computer Use Best Practices: How Screenshot Pruning and Prompt Caching Lower Agent Costs

May 20, 2026 · 7 min read

Browser Agents Spend Money Through Screenshots

Claude's guidance for production computer and browser use makes one cost driver very clear: screenshots are expensive context. A single screenshot can add roughly 1,000-1,800 tokens, and a 200k context window can fill in under 100 screenshots. For browser agents that inspect UI state every turn, visual context can become the largest part of the bill.

This is why UI agents need cost engineering, not just prompt engineering. If an agent keeps every screenshot, every tool result, and every failed click in the conversation, costs rise even when the actual task is simple. The fix is to manage screenshots, context, caching, and model choice as first-class parts of the architecture.

Start With the Right Resolution

High-resolution screenshots are not always better. Claude's guidance recommends starting around 1280×720 for most Claude 4.6-family computer-use tasks, and around 1080p for Opus 4.7. Screenshots that exceed model or API limits may be downscaled internally, which can reduce click accuracy while still wasting tokens and bandwidth.

For cost-sensitive browser automation, the rule is simple: send the smallest screenshot that preserves the UI details needed for the decision. If the task involves a tiny control or a dense table, zoom or crop the relevant region instead of sending a full 4K desktop image.

Use Cache-Aware Screenshot Pruning

The most practical technique is screenshot pruning. Keep only a few recent screenshots, such as the last three, and replace older screenshots with short text placeholders. This preserves the history shape while removing the largest token blocks.

Claude's guidance also suggests pruning older screenshots in batches instead of one at a time. That is important for prompt caching: if you mutate the conversation prefix every turn, you can break cache reuse. Batch pruning every 25 turns, for example, keeps the stable prefix more stable and improves the odds of cache hits.

Strategy What it saves Best for
Keep last 3 screenshotsVisual context tokensLong browser sessions
Batch prune every 25 turnsCache invalidationCached agent loops
Replace images with summariesOld image tokensAuditable history

Prompt Caching Is the Main Cost Lever

Browser agents often have a large stable prefix: system instructions, safety rules, tool descriptions, application context, and workflow constraints. Prompt caching can make that repeated prefix cheaper when it stays stable. Claude's guidance recommends using cache breakpoints on the stable prefix and recent tool results, then clearing and replacing trailing cache markers each turn.

The practical design goal is to keep stable content stable. Put long-lived instructions first, dynamic observations later, and avoid rewriting the system prompt on every step. Cache-aware message layout can be the difference between a browser agent that is affordable and one that burns budget on repeated context.

Model Routing: Sonnet for Default, Opus for Hard Cases

Model choice directly affects UI agent cost. Claude's guidance positions Sonnet 4.6 as the default balance of click precision, reasoning, and cost, with Opus 4.7 reserved for harder reasoning or high-resolution sources. The current estimator prices Claude Sonnet 4.6 at $3.00 per million input tokens and $15.00 per million output tokens, while Claude Opus 4.7 is $5.00 input and $25.00 output.

A planner/executor split can reduce cost: use a cheaper model for mechanical UI steps and escalate to a stronger model only when the agent gets stuck, needs recovery, or must make a complex decision. This is especially effective in browser workflows where many turns are simple observations and clicks.

Batch Mechanical Actions

Browser and computer batch tools can combine multiple independent actions into one tool call. Filling several form fields, pressing a sequence of keys, or opening multiple independent pages does not always need a full model round trip between every action. Batching reduces latency and can reduce the number of observation turns that add screenshots to the context.

Do not batch actions when later steps depend on seeing the result of earlier steps. The goal is to remove unnecessary turns, not to make the agent blind.

Bottom Line

Production browser agents are not priced like simple chatbots. They carry screenshots, tool logs, long histories, and repeated instructions. The biggest savings come from screenshot pruning, prompt caching, compaction, batching, and model routing.

If you are building a UI agent or browser automation workflow, model its token cost before launch. Use the AI Cost Estimator to compare Sonnet, Opus, GPT, Gemini, and budget models for the same task shape.

Want to calculate exact costs for your project?