OpenRouter Advisor: Let Cheap Models Call Expensive Ones Only When Needed

By Eric Bush · June 10, 2026 · 7 min read

Data dashboard with analytics charts and graphs

The Economics of Model Routing

OpenRouter has launched Advisor, a new feature that allows a budget model to consult a frontier model mid-generation — only when the cheaper model detects it needs help. Think of it as giving Claude Haiku ($1/$5) a phone-a-friend lifeline to Claude Opus ($5/$25). The budget model handles what it can, and escalates the hard parts.

The core insight is simple: not every token in a response needs frontier-level reasoning. When generating a 2,000-token code implementation, maybe 200 tokens involve genuinely hard decisions (algorithm choice, edge case handling, architecture calls). The other 1,800 tokens are boilerplate, syntax, and straightforward logic that any model handles correctly. Why pay Opus pricing for all 2,000?

How Advisor Works

The mechanism is straightforward: you set a primary model (cheap) and an advisor model (expensive). The primary model generates normally, but when its confidence drops below a configurable threshold — or when it encounters patterns it recognizes as beyond its capability — it pauses, sends context to the advisor, receives guidance, and continues generating.

The advisor doesn't generate the full response. It provides targeted guidance: "use a semaphore here, not a mutex" or "this edge case needs null coalescing." The primary model then incorporates that guidance into its ongoing generation. The result reads as a single coherent response, but the expensive model only processed the genuinely difficult portions.

You control the escalation sensitivity. Higher sensitivity means more advisor consultations (higher quality, higher cost). Lower sensitivity means the cheap model handles more independently (lower cost, occasionally lower quality on hard sections).

The Math: When 10-20% Escalation Saves 60-80%

Let's model a typical coding session. Assume 100 tasks per day, averaging 500 input + 2,000 output tokens each. Here's what different approaches cost:

Strategy	Daily Cost	Monthly Cost	Savings vs Pure Opus
100% Opus 4.8	$5.25	$115.50	—
100% Sonnet 4.6	$3.15	$69.30	40%
Advisor: Haiku + Opus (20% escalation)	$1.89	$41.58	64%
Advisor: Haiku + Opus (10% escalation)	$1.47	$32.34	72%
Advisor: Haiku + Sonnet (15% escalation)	$1.31	$28.82	75%
100% Haiku 4.5	$1.05	$23.10	80%

The "Advisor: Haiku + Opus (20% escalation)" row is the sweet spot for most developers. You get Opus-quality reasoning on the hard parts while paying close to Haiku rates on everything else. Monthly cost drops from $115 to $42 — a 64% reduction — while maintaining high quality where it matters.

Breaking Down the Escalation Math

Here's how the 20% escalation cost works in detail. For each task (500 input + 2,000 output tokens):

Primary model (Haiku, all tokens): 500 input tokens × $1/M + 2,000 output tokens × $5/M = $0.0005 + $0.01 = $0.0105 per task.

Advisor model (Opus, 20% of tasks): On escalated portions, roughly 300 tokens of context sent to Opus + 100 tokens of guidance received. Per escalation: 300 × $5/M + 100 × $25/M = $0.0015 + $0.0025 = $0.004. At 20% escalation rate across 100 tasks: 20 × $0.004 = $0.08.

Total daily: $1.05 (Haiku base) + $0.80 (Opus consultations) = $1.85/day. Compare with $5.25/day for pure Opus.

Quality Analysis: What Gets Lost?

The critical question: does quality actually hold up? Based on early benchmarks from OpenRouter's testing:

At 20% escalation: Output quality measures at approximately 90-93% of pure Opus quality on coding benchmarks. The primary loss comes from cases where the cheap model doesn't recognize it needs help — it's confident but wrong.

At 10% escalation: Quality drops to about 85-88% of Opus. More errors slip through because fewer hard decisions get escalated.

At 30%+ escalation: Diminishing returns. You approach Opus quality but costs rise toward just using Sonnet directly, which is simpler.

The sweet spot appears to be 15-20% escalation — capturing the genuinely hard decisions while keeping the easy ones cheap.

Best Configurations for Coding

Different coding tasks benefit from different Advisor configurations:

Bug fixing: Haiku primary + Opus advisor at 25% sensitivity. Bug diagnosis often requires deeper reasoning that Haiku struggles with. Higher escalation rate is worth it here.

Feature implementation: Haiku primary + Sonnet advisor at 15% sensitivity. Most feature code is straightforward; Sonnet handles the occasional architectural question without needing Opus.

Test writing: Haiku primary + Sonnet advisor at 10% sensitivity. Test generation is highly templated — Haiku handles it well independently, with rare escalations for complex mocking scenarios.

Code review: Sonnet primary + Opus advisor at 20% sensitivity. Review requires nuanced judgment; starting from Sonnet ensures baseline quality, with Opus catching subtle issues.

Comparison With Other Routing Approaches

Advisor isn't the first model routing system. How does it compare?

vs Static routing (Martian, Unify): Static routers choose one model per request before generation starts. Advisor is more granular — it can escalate mid-response for specific sections. This means better cost optimization on long, multi-part responses.

vs Cascading (try cheap, fall back to expensive): Cascading wastes the cheap model's tokens entirely on failure. Advisor never discards work — the cheap model's output is always used, just enhanced at specific points.

vs Manual model selection: Most developers don't want to choose a model per-task. Advisor automates the decision at a more granular level than humans would manage manually.

Real-World Example: A Day of Coding

Consider a typical development day: 30 feature implementation tasks, 20 bug fixes, 30 test-writing tasks, and 20 code review tasks. Using task-appropriate Advisor configurations:

Task Type	Count	Pure Opus Cost	Advisor Cost
Feature implementation	30	$1.58	$0.45
Bug fixing	20	$1.05	$0.42
Test writing	30	$1.58	$0.38
Code review	20	$1.05	$0.52
Total	100	$5.26	$1.77

That's a 66% daily cost reduction — from $5.26 to $1.77 — while retaining frontier reasoning on the genuinely hard decisions. Monthly savings: roughly $77. Over a year for a solo developer: $924 saved.

Bottom Line

OpenRouter Advisor introduces the most granular model routing available for AI coding. By letting cheap models handle routine generation and escalating only genuinely difficult decisions to frontier models, you can cut AI coding costs by 60-75% while preserving quality where it matters. The key insight: most tokens in a coding response don't need $25/M-quality reasoning. Pay frontier prices only for the 10-20% of decisions that actually benefit from it.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

OpenRouter's Subagent Tool: Delegate Subtasks to Cheap Models and Slash Frontier Model Costs

OpenRouter launched openrouter:subagent — a server tool that lets frontier models delegate trivial subtasks to budget models mid-generation. We analyze this cost optimization architecture for AI coding agents.

What Is Model Orchestration? Using Cheap Models for Building and Expensive Models for Review

Learn how model orchestration cuts AI coding costs by routing generation to budget models and verification to premium models. Includes real-world patterns, cost savings math, and when it helps vs hurts.

Cheap vs Expensive AI Models for Code Review: Is Premium Worth It?

Compare budget models like DeepSeek V4 Flash vs premium models like Claude Opus 4.7 for code review. Cost per PR, what each tier catches, and when premium pays for itself.

← Previous

Cohere North Mini Code: 80% SWE-Bench at 3B Active Parameters

Claude Fable 5 vs Claude Mythos 5: Pricing, Performance and Which to Use for AI Coding