What Is Model Orchestration? Using Cheap Models for Building and Expensive Models for Review
June 14, 2026 · 7 min read
Model Orchestration Defined
Model orchestration is the practice of routing different parts of an AI coding workflow to different models based on cost-quality tradeoffs. The core principle: use cheap, fast models for generation and first drafts, then route to expensive, capable models for review and verification.
This isn't new in software engineering — it mirrors code review culture. Junior developers write code, senior developers review it. Model orchestration applies the same hierarchy to AI models, capturing 80% of premium model quality at 30% of the cost.
The Economics of Build vs Review
Code generation is token-heavy. A typical feature implementation generates 2,000–10,000 output tokens across multiple iterations. Review, by contrast, requires substantial input tokens (reading the code) but minimal output (approve, reject, or suggest specific fixes). This asymmetry makes orchestration economically powerful.
| Approach | Generation Cost | Review Cost | Total |
|---|---|---|---|
| All Opus 4.8 ($5/$25) | $0.175 | $0.075 | $0.250 |
| All Sonnet 4.6 ($3/$15) | $0.105 | $0.045 | $0.150 |
| Orchestrated: Sonnet builds, Opus reviews | $0.105 | $0.075 | $0.180 |
The orchestrated approach costs 28% less than all-Opus while retaining Opus-level verification quality. The savings compound — over 100 tasks per day, that's $7/day or ~$210/month saved per developer.
Real-World Orchestration Patterns
Pattern 1: /architect mode. Tools like Claude Code's /architect command use a premium model (Opus 4.8) to plan the approach, then hand execution to a cheaper model (Sonnet 4.6). The expensive model sees the full context once and produces a plan; the cheap model executes mechanically.
Pattern 2: Sub-agent delegation. A primary agent running Opus spawns sub-agents on Sonnet for parallelizable tasks — searching codebases, writing tests, generating boilerplate. The primary agent reviews outputs before committing changes.
Pattern 3: Semantic routing. A lightweight classifier (often a small model or heuristic) categorizes incoming tasks by complexity. Simple tasks route to budget models directly. Complex tasks route to premium models. This eliminates the orchestration overhead for tasks that don't benefit from it.
Pattern 4: Generate-then-verify loops. A cheap model generates code, a premium model reviews it, and if rejected, the cheap model regenerates based on feedback. This typically converges in 1–2 cycles for well-specified tasks.
Cost Savings Math: A Worked Example
Consider a team running 500 coding tasks per week. Current approach: all tasks on Opus 4.8 at an average cost of $0.25/task = $125/week.
With orchestration: 400 routine tasks on Sonnet 4.6 at $0.15/task ($60) + 100 complex tasks on Opus 4.8 at $0.25/task ($25) + review pass on all tasks by Opus at $0.075/task ($37.50) = $122.50/week. Marginal savings here because Opus still reviews everything.
Optimized orchestration: only review the Sonnet outputs with Opus (skip review for Opus-generated code). 400 × $0.15 + 400 × $0.075 + 100 × $0.25 = $60 + $30 + $25 = $115/week. That's 8% savings with equivalent quality on complex tasks and Opus-verified quality on routine tasks.
The real savings come from using even cheaper models for generation. With GLM-5.2 (free) for initial generation and Sonnet 4.6 for review: 400 × $0 + 400 × $0.045 + 100 × $0.25 = $0 + $18 + $25 = $43/week — a 66% reduction.
When Orchestration Helps
Orchestration delivers the most value when: tasks are clearly separable into generation and verification phases; your task mix includes both routine and complex work; you have high volume (savings compound with scale); and the cheap model's output is "close enough" that review catches issues faster than regeneration.
When Orchestration Hurts
Orchestration adds cost when: tasks are uniformly complex (you still need the premium model for generation); the cheap model's output quality is so low that review becomes a full rewrite; latency matters more than cost (two model calls instead of one); or your volume is too low for the infrastructure complexity to pay off. For a solo developer running 10 tasks/day, the savings may not justify the setup overhead.
Getting Started
Start simple: use your current premium model as the reviewer and add a cheaper model for generation on routine tasks only. Measure the reject rate — if the premium model approves 80%+ of cheap model outputs without changes, orchestration is working. If it's rejecting more than 40%, the cheap model isn't suitable for that task category. Use the AI Cost Estimator to compare costs across different orchestration configurations.
Frequently Asked Questions
What is model orchestration in AI coding?
Model orchestration routes different workflow steps to different AI models — typically using cheap models ($3/M input) for code generation and expensive models ($5/M input) for review and verification.
How much can model orchestration save?
Depending on your task mix, orchestration typically saves 20-66% compared to using a premium model for all tasks. Savings scale with volume and the price gap between your generation and review models.
Which models work best for the 'cheap generation' role?
Sonnet 4.6 ($3/$15), GLM-5.2 (free), and DeepSeek models offer strong generation quality at low cost. The best choice depends on your accuracy requirements and acceptable reject rate.
Does orchestration add latency?
Yes — typically 2-5 seconds per task for the additional review call. For interactive coding this is noticeable; for batch operations or CI pipelines it's negligible.
When should I NOT use model orchestration?
Skip orchestration when all your tasks require premium reasoning, when volume is under 10 tasks/day, or when the cheap model's reject rate exceeds 40% for your task type.
Want to calculate exact costs for your project?
Related Articles
Cheap vs Expensive AI Models for Code Review: Is Premium Worth It?
Compare budget models like DeepSeek V4 Flash vs premium models like Claude Opus 4.7 for code review. Cost per PR, what each tier catches, and when premium pays for itself.
The Cost of AI Code Review: Should You Build Cheap and Review Expensive?
Using a premium model to review code written by a cheap one is a popular cost-saving pattern. We break down when the build-cheap, review-expensive split actually saves money—and when it doesn't.
OpenRouter's Subagent Tool: Delegate Subtasks to Cheap Models and Slash Frontier Model Costs
OpenRouter launched openrouter:subagent — a server tool that lets frontier models delegate trivial subtasks to budget models mid-generation. We analyze this cost optimization architecture for AI coding agents.