← Back to Blog

Claude Opus 4.7 Finishes Robotics Tasks 20× Faster With 10× Less Code: The Cost-Per-Task Story

June 20, 2026 · 8 min read

A robotic arm performing a precise task in a technical environment

The Result

Anthropic published phase two of its Project Fetch experiment on June 20, 2026. In the original August 2024 run, a human team equipped with Claude Opus 4.1 significantly outperformed a team with no AI when operating a quadruped robot. In the new phase, Claude Opus 4.7 completed all tasks with no human assistance — roughly 20× faster than the fastest human team, over 37× faster than the no-Claude team, and with nearly 10× less code.

The model excelled at sensor integration and path planning, while still struggling with precise closed-loop control tasks like nudging a beach ball accurately. Anthropic's framing is the most interesting part: these gains came from general-purpose model scaling, not robotics-specific tuning.

This is a robotics headline, but the cost lesson generalizes directly to AI coding. "Faster and with less code" is, in token terms, "cheaper per task" — and it reframes how you should think about paying for a frontier model.

Why "Less Code, Faster" Means "Cheaper Per Task"

Token cost scales with how much a model reads and writes to finish a job. A model that solves a task in nearly 10× less code and far fewer steps is, by definition, emitting fewer output tokens and burning fewer iterations to get there. The per-token price can stay flat — or even rise — while the cost to complete a given task falls, because the model needs so much less back-and-forth.

This is the single most under-appreciated dynamic in AI coding economics. Developers fixate on sticker price — the per-million-token rate — and miss that capability is a cost lever. Claude Opus 4.8 at $5/$25 per million tokens looks expensive next to DeepSeek V4 Pro at $0.435/$0.87. But if the more capable model finishes a complex task in one clean pass while the cheaper one needs five attempts, three of which produce broken code, the "expensive" model can be cheaper per completed task.

Sticker Price vs. Cost Per Task

Consider a concrete example. Say a non-trivial refactor needs roughly 50,000 input tokens of context either way. A frontier model nails it in one pass, emitting 8,000 output tokens. A cheaper model needs three passes — re-reading context each time — and emits 20,000 total output tokens across the attempts, plus your time fixing the failures.

Frontier (Opus 4.8): 50K × $5/M + 8K × $25/M = $0.25 + $0.20 = $0.45.
Cheaper model, three passes: 150K × $0.435/M + 20K × $0.87/M = $0.065 + $0.017 = $0.082.

On tokens alone, the cheaper model still wins this particular example — which is exactly why budget models are so popular for routine work. But the calculation flips the moment failures cost your time, or the task is hard enough that the cheap model never converges. The lesson from Project Fetch is that capability gaps on hard, multi-step tasks are large, and on those tasks the frontier model's single clean pass beats a pile of cheap retries.

How to Apply This to Your Budget

Measure cost per completed task, not cost per token. The number that matters is dollars per merged PR or per shipped feature, including retries and human cleanup. Track that, and the right model often becomes obvious in a way the price list alone never shows.

Route by task difficulty. Use cheap models for the boilerplate they handle in one shot, and reserve frontier models for the hard, multi-step problems where their fewer-iterations advantage pays off. Mixed routing beats picking one model for everything.

Re-evaluate when capability jumps. Every time a model gets meaningfully better at finishing tasks in fewer steps, your cost-per-task math shifts even if prices don't. A capability release is a budgeting event, not just a feature announcement.

Project Fetch is about robots, but its lesson is universal: as models get better at doing more with less, the meaningful cost metric moves from price-per-token to price-per-outcome. Use our cost calculator to estimate both, and decide where a frontier model's efficiency earns its premium.

Frequently Asked Questions

Does a more capable model always cost more?

Per token, often yes. But cost per completed task can be lower for a more capable model, because it finishes hard, multi-step jobs in fewer iterations and less code — emitting fewer output tokens and avoiding the failed retries that inflate a cheaper model's total spend.

What did Project Fetch phase two show?

Claude Opus 4.7 completed robotics tasks autonomously — roughly 20× faster than the fastest human team and with nearly 10× less code — driven by general-purpose model scaling rather than robotics-specific tuning. It still struggled with precise closed-loop control like nudging a beach ball.

Should I switch to frontier models for everything?

No. Route by task difficulty: cheap models handle boilerplate they solve in one shot, while frontier models are worth their premium on hard, multi-step problems where their fewer-iterations advantage offsets the higher per-token price. Mixed routing beats one model for all work.

How do I measure cost per task?

Track dollars per merged PR or shipped feature, including retries and human cleanup time — not just per-token rates. That metric captures the efficiency advantage of capable models and often makes the right choice obvious in ways a price list alone cannot.

Want to calculate exact costs for your project?