Claude Computer Use Best Practices: How Screenshot Pruning and Prompt Caching Lower Agent Costs
May 20, 2026 · 7 min read
Browser Agents Spend Money Through Screenshots
Claude's guidance for production computer and browser use makes one cost driver very clear: screenshots are expensive context. A single screenshot can add roughly 1,000-1,800 tokens, and a 200k context window can fill in under 100 screenshots. For browser agents that inspect UI state every turn, visual context can become the largest part of the bill.
This is why UI agents need cost engineering, not just prompt engineering. If an agent keeps every screenshot, every tool result, and every failed click in the conversation, costs rise even when the actual task is simple. The fix is to manage screenshots, context, caching, and model choice as first-class parts of the architecture.
Start With the Right Resolution
High-resolution screenshots are not always better. Claude's guidance recommends starting around 1280×720 for most Claude 4.6-family computer-use tasks, and around 1080p for Opus 4.7. Screenshots that exceed model or API limits may be downscaled internally, which can reduce click accuracy while still wasting tokens and bandwidth.
For cost-sensitive browser automation, the rule is simple: send the smallest screenshot that preserves the UI details needed for the decision. If the task involves a tiny control or a dense table, zoom or crop the relevant region instead of sending a full 4K desktop image.
Use Cache-Aware Screenshot Pruning
The most practical technique is screenshot pruning. Keep only a few recent screenshots, such as the last three, and replace older screenshots with short text placeholders. This preserves the history shape while removing the largest token blocks.
Claude's guidance also suggests pruning older screenshots in batches instead of one at a time. That is important for prompt caching: if you mutate the conversation prefix every turn, you can break cache reuse. Batch pruning every 25 turns, for example, keeps the stable prefix more stable and improves the odds of cache hits.
| Strategy | What it saves | Best for |
|---|---|---|
| Keep last 3 screenshots | Visual context tokens | Long browser sessions |
| Batch prune every 25 turns | Cache invalidation | Cached agent loops |
| Replace images with summaries | Old image tokens | Auditable history |
Prompt Caching Is the Main Cost Lever
Browser agents often have a large stable prefix: system instructions, safety rules, tool descriptions, application context, and workflow constraints. Prompt caching can make that repeated prefix cheaper when it stays stable. Claude's guidance recommends using cache breakpoints on the stable prefix and recent tool results, then clearing and replacing trailing cache markers each turn.
The practical design goal is to keep stable content stable. Put long-lived instructions first, dynamic observations later, and avoid rewriting the system prompt on every step. Cache-aware message layout can be the difference between a browser agent that is affordable and one that burns budget on repeated context.
Model Routing: Sonnet for Default, Opus for Hard Cases
Model choice directly affects UI agent cost. Claude's guidance positions Sonnet 4.6 as the default balance of click precision, reasoning, and cost, with Opus 4.7 reserved for harder reasoning or high-resolution sources. The current estimator prices Claude Sonnet 4.6 at $3.00 per million input tokens and $15.00 per million output tokens, while Claude Opus 4.7 is $5.00 input and $25.00 output.
A planner/executor split can reduce cost: use a cheaper model for mechanical UI steps and escalate to a stronger model only when the agent gets stuck, needs recovery, or must make a complex decision. This is especially effective in browser workflows where many turns are simple observations and clicks.
Batch Mechanical Actions
Browser and computer batch tools can combine multiple independent actions into one tool call. Filling several form fields, pressing a sequence of keys, or opening multiple independent pages does not always need a full model round trip between every action. Batching reduces latency and can reduce the number of observation turns that add screenshots to the context.
Do not batch actions when later steps depend on seeing the result of earlier steps. The goal is to remove unnecessary turns, not to make the agent blind.
Bottom Line
Production browser agents are not priced like simple chatbots. They carry screenshots, tool logs, long histories, and repeated instructions. The biggest savings come from screenshot pruning, prompt caching, compaction, batching, and model routing.
If you are building a UI agent or browser automation workflow, model its token cost before launch. Use the AI Cost Estimator to compare Sonnet, Opus, GPT, Gemini, and budget models for the same task shape.
Want to calculate exact costs for your project?
Related Articles
Prompt Caching Explained: How to Cut Your AI Coding Costs by Up to 90%
Learn how prompt caching works and why cached input tokens cost 90% less. We break down Anthropic's caching, provider support, and practical tips for maximizing cache hits.
Claude Code Auto Mode Comes to Pro: What Lower Agent Access Means for Coding Costs
Claude Code auto mode is now available on Pro and supports Sonnet 4.6 and Opus 4.7. Here is what that changes for AI coding costs and developer workflows.
DeepSeek V4 + Claude Code: Why Developers Are Mixing Models to Cut Costs
Pairing cheap models like DeepSeek V4 with premium tools like Claude Code lets you get top-tier AI coding results at a fraction of the cost. Here's how the strategy works.