Weave Router vs OpenRouter, LiteLLM, and Portkey: When Does Local Model Routing Pay Off?

June 28, 2026 · 9 min read

Switchboard with cables routing connections

Another Router Enters the Ring

Weave hit Hacker News on June 28, 2026 as a local-first intelligent model router. Install is one line — npx @workweave/router — and it runs as a localhost proxy on port 8080. The pitch: a cluster-scoring algorithm based on Avengers-Pro 1 picks the best model per request, you keep your provider keys locally, and tracing flows out through OTLP for self-hosted observability.

Weave joins a crowded category. OpenRouter, LiteLLM, and Portkey have been routing LLM calls for over a year. The question for any team adding a router today is not "which is best in the abstract?" — it's "what does my workload actually need, and when does a local router beat a cloud one?"

Side-by-Side Capability Map

Capability	Weave (Local)	OpenRouter (Cloud)	LiteLLM (Self-Host)	Portkey (Cloud)
Key custody	Local, encrypted	Cloud	Self-host	Cloud
Auto routing	Avengers-Pro 1	Pareto curves	Manual rules	Cluster scoring
Markup on traffic	0%	5%	0%	0-3% tier-based
Observability	OTLP self-host	Built-in dashboard	Self-host	Built-in dashboard
Client compatibility	Claude Code, Codex, Cursor	OpenAI-compat	OpenAI-compat	OpenAI-compat
Data residency control	Full (you hold)	Partial (region routing)	Full	Partial

When the Local Router Wins

Three workloads make Weave (or LiteLLM self-hosted) the right pick over OpenRouter or Portkey:

Sensitive data flows. If your prompts contain customer PII, source code under NDA, or regulated content, sending them through a cloud proxy adds a vendor to your trust boundary. Local routing keeps the data inside your network.

High-volume workloads at frontier scale. OpenRouter's 5% markup on a $50K/month spend is $2,500/month — call it $30K/year. That funds an engineer-day per month of LiteLLM ops easily. Above $100K/month, the markup math gets harder to justify.

Single-developer or single-team setup. Weave specifically targets the local Claude Code / Codex / Cursor user who wants smart routing without standing up a cloud gateway. The installation is essentially zero-friction.

When the Cloud Router Wins

OpenRouter and Portkey hold three structural advantages that local routers cannot replicate:

Provider failover. OpenRouter routes around 30-90 second provider outages automatically using its multi-region presence. A localhost router on your machine cannot do this without you also running multi-region infrastructure.

Built-in spend governance. Cloud routers offer team-level spend caps, per-user quotas, and email alerts out of the box. Replicating these with OTLP + Grafana on a local router is doable but takes 1-2 days of setup.

Negotiated provider rates. OpenRouter has volume contracts with several providers that price-match or beat direct API rates after their markup. For low-volume teams the cloud router is sometimes cheaper than going direct.

A Decision Rule

Three thresholds to make the choice numerical, not philosophical:

Under $5K/month spend: use OpenRouter or Portkey. The setup and operations cost of a local router is not worth it at this volume.
$5K-$50K/month: Weave for a single team, LiteLLM self-hosted for multiple teams. Cloud router markup starts to bite.
$50K+/month: LiteLLM self-hosted with proper observability, OR direct provider relationships with custom rate cards. Cloud router markup is real money.

Weave's Specific Niche

Weave is positioned at the smaller end of the market — individual developers and small teams using Claude Code, Codex, or Cursor. The Avengers-Pro 1 cluster scorer is the differentiator: it claims to pick the right model per request based on prompt features (length, code language, task type) rather than blanket rules.

Whether the auto-routing actually saves money over a simpler "Sonnet 4.6 for code, Haiku 4.5 for tools" hand-coded rule depends on workload shape. Run a 1-week shadow test before committing.

The Real Cost Lever

The model routing market is now saturated. The router itself contributes 5-15% of total savings. The other 85% comes from which models you route to. A router that diligently sends every request to Claude Opus 4.8 will spend more than a hand-coded script that sends most requests to DeepSeek V4-Flash. Pick the router that fits your trust model and team size; spend the engineering time on the routing policy, not the router selection.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

Frequently Asked Questions

Is Weave free?

The router itself is free and open source. You pay only for the underlying provider API calls, which Weave forwards using your own keys.

Can I use Weave behind a corporate proxy?

Yes, but you need to expose the OTLP endpoint outbound for tracing. The provider API calls themselves go through standard HTTPS.

Does Weave support self-hosted models like DeepSeek or Llama on your own GPU?

Currently yes via OpenAI-compatible endpoints — point Weave at your inference server URL and it routes there like any other provider.

How does Avengers-Pro 1 actually pick models?

It clusters prompts by feature embeddings and historical accuracy/cost data, then selects the model with the best score in that cluster. The training data is publicly documented in the Avengers-Pro 1 paper.

OpenRouter MCP Server: Real-Time Model Pricing Inside Claude Code and Cursor

OpenRouter shipped an MCP server on June 27, 2026 that lets Claude Code, Cursor, and other agents query live model pricing, benchmark scores, and Artificial Analysis data at runtime. Free tier has a $10 spend cap and 7-day key expiry. We dig into how this changes 'which model should I use' from a config-file decision to a per-request routing decision — and what it does to your monthly bill.

OpenRouter Launches Pareto Code: Auto-Route to the Cheapest Coding Model

OpenRouter's new Pareto Code tool uses min_coding_score to auto-select the cheapest model that meets your quality threshold. Here's how it changes AI coding cost optimization for developers.

AI Model Migration Cost Calculator: When Switching From Claude to DeepSeek Actually Pays Off

Inspired by Lindy's 100% Claude-to-DeepSeek switch, this guide gives you a worked calculator: switching cost inputs, payback formula, and break-even thresholds for migrating across frontier providers. Run the numbers before you commit.

← Previous

DeepSeek's DSpark Cuts V4 Inference Time by 60-85% — What That Does to API Pricing

The $175B AI Economy Report: Why Token Elasticity Should Reshape Your 12-Month Coding Budget