Claude Computer Use Best Practices: How Screenshot Pruning and Prompt Caching Lower Agent Costs

By Eric Bush · May 20, 2026 · 7 min read

Earth from space showing digital network connections

Browser Agents Spend Money Through Screenshots

Claude's guidance for production computer and browser use makes one cost driver very clear: screenshots are expensive context. A single screenshot can add roughly 1,000-1,800 tokens, and a 200k context window can fill in under 100 screenshots. For browser agents that inspect UI state every turn, visual context can become the largest part of the bill.

This is why UI agents need cost engineering, not just prompt engineering. If an agent keeps every screenshot, every tool result, and every failed click in the conversation, costs rise even when the actual task is simple. The fix is to manage screenshots, context, caching, and model choice as first-class parts of the architecture.

Start With the Right Resolution

High-resolution screenshots are not always better. Claude's guidance recommends starting around 1280×720 for most Claude 4.6-family computer-use tasks, and around 1080p for Opus 4.7. Screenshots that exceed model or API limits may be downscaled internally, which can reduce click accuracy while still wasting tokens and bandwidth.

For cost-sensitive browser automation, the rule is simple: send the smallest screenshot that preserves the UI details needed for the decision. If the task involves a tiny control or a dense table, zoom or crop the relevant region instead of sending a full 4K desktop image.

Use Cache-Aware Screenshot Pruning

The most practical technique is screenshot pruning. Keep only a few recent screenshots, such as the last three, and replace older screenshots with short text placeholders. This preserves the history shape while removing the largest token blocks.

Claude's guidance also suggests pruning older screenshots in batches instead of one at a time. That is important for prompt caching: if you mutate the conversation prefix every turn, you can break cache reuse. Batch pruning every 25 turns, for example, keeps the stable prefix more stable and improves the odds of cache hits.

Strategy	What it saves	Best for
Keep last 3 screenshots	Visual context tokens	Long browser sessions
Batch prune every 25 turns	Cache invalidation	Cached agent loops
Replace images with summaries	Old image tokens	Auditable history

Prompt Caching Is the Main Cost Lever

Browser agents often have a large stable prefix: system instructions, safety rules, tool descriptions, application context, and workflow constraints. Prompt caching can make that repeated prefix cheaper when it stays stable. Claude's guidance recommends using cache breakpoints on the stable prefix and recent tool results, then clearing and replacing trailing cache markers each turn.

The practical design goal is to keep stable content stable. Put long-lived instructions first, dynamic observations later, and avoid rewriting the system prompt on every step. Cache-aware message layout can be the difference between a browser agent that is affordable and one that burns budget on repeated context.

Model Routing: Sonnet for Default, Opus for Hard Cases

Model choice directly affects UI agent cost. Claude's guidance positions Sonnet 4.6 as the default balance of click precision, reasoning, and cost, with Opus 4.7 reserved for harder reasoning or high-resolution sources. The current estimator prices Claude Sonnet 4.6 at $3.00 per million input tokens and $15.00 per million output tokens, while Claude Opus 4.7 is $5.00 input and $25.00 output.

A planner/executor split can reduce cost: use a cheaper model for mechanical UI steps and escalate to a stronger model only when the agent gets stuck, needs recovery, or must make a complex decision. This is especially effective in browser workflows where many turns are simple observations and clicks.

Batch Mechanical Actions

Browser and computer batch tools can combine multiple independent actions into one tool call. Filling several form fields, pressing a sequence of keys, or opening multiple independent pages does not always need a full model round trip between every action. Batching reduces latency and can reduce the number of observation turns that add screenshots to the context.

Do not batch actions when later steps depend on seeing the result of earlier steps. The goal is to remove unnecessary turns, not to make the agent blind.

Bottom Line

Production browser agents are not priced like simple chatbots. They carry screenshots, tool logs, long histories, and repeated instructions. The biggest savings come from screenshot pruning, prompt caching, compaction, batching, and model routing.

If you are building a UI agent or browser automation workflow, model its token cost before launch. Use the AI Cost Estimator to compare Sonnet, Opus, GPT, Gemini, and budget models for the same task shape.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

Prompt Caching with Deep Agents: How Teams Cut Agent Token Costs by 41-80%

LangChain says prompt caching with deep agents can reduce costs by 41-80% depending on setup. This guide explains what gets cached, why provider behavior differs, and how to calculate real savings for AI coding agents.

Prompt Caching Across Claude, GPT, and Gemini: A 2026 Cost-Saving Playbook for Coding Agents

Prompt caching is the single biggest cost lever for AI coding agents in 2026 — but every provider implements it differently. We compare Anthropic's explicit breakpoints, OpenAI's new GPT-5.6 30-minute contract, and Gemini's implicit prefix caching. Numbers, decision rules, and the migration trade-offs for switching between them.

Prompt Caching Explained: How to Cut Your AI Coding Costs by Up to 90%

Learn how prompt caching works and why cached input tokens cost 90% less. We break down Anthropic's caching, provider support, and practical tips for maximizing cache hits.

← Previous

Claude Code v2.1.145 Adds Agent JSON and Better OTEL Traces: Why Observability Matters for AI Coding Spend

Google’s $100 AI Ultra Plan: Is 5× More Antigravity Usage Worth It for Developers?