How to Audit Your LLM Gateway: Tracking Token Spend Across Multiple Providers

June 22, 2026 · 7 min read

Analytics dashboard showing data charts and spending metrics

The Multi-Provider Visibility Problem

Most teams using AI coding agents end up with token spend distributed across multiple providers: Claude via Anthropic's API for complex reasoning, GPT-5.4 Mini for high-volume autocomplete, DeepSeek V4 Flash for batch processing. Each provider has its own dashboard, its own billing cycle, and its own way of reporting usage.

Without unified visibility, waste hides in the gaps. A developer might be routing complex tasks to GPT-5.5 ($5/$30 per M) when Claude Sonnet 4.6 ($3/$15 per M) would produce equivalent results. A CI pipeline might be running redundant LLM calls that could be cached. A forgotten experimental integration might be burning $200/month unnoticed.

Step 1: Centralize Your Token Accounting

Before you can optimize, you need a single view of all LLM spend. Three approaches, from simplest to most flexible:

Gateway-native analytics (easiest): If you already route through OpenRouter or Portkey, their dashboards show per-model, per-key, and per-request cost breakdowns. OpenRouter's activity page shows real-time spend per model with daily/weekly/monthly aggregation. This requires zero setup beyond using the gateway.

Proxy-based observability (Helicone, LangSmith): These tools act as lightweight proxies — you change your base URL, and they log every request with full token counts and cost calculations. Helicone supports Anthropic, OpenAI, and other providers simultaneously, giving you a unified dashboard even without a gateway. Setup is one line of code per provider.

Custom logging (most control): Wrap your LLM client with a logging layer that records model, token counts, estimated cost, caller identity, and timestamp to your own database. This gives full control over data retention and custom reporting, but requires engineering time to build and maintain.

Step 2: Tag Requests by Purpose

Raw token counts are not actionable without context. Tag every LLM request with metadata that helps you understand where money goes:

Purpose: autocomplete, code-review, refactoring, test-generation, debugging
Caller: which developer, CI pipeline, or automated system made the request
Project: which repository or service the request relates to
Outcome: did the response get accepted, rejected, or regenerated

OpenRouter supports custom headers for metadata tagging. Helicone uses a Helicone-Property-* header pattern. For custom solutions, add a wrapper function that attaches metadata to every call.

Step 3: Identify the Five Common Waste Patterns

Once you have tagged spend data, look for these patterns that consistently appear in team audits:

1. Model overqualification: Using Claude Opus 4.8 ($5/$25 per M) for tasks that Claude Sonnet 4.6 ($3/$15 per M) handles equally well. Common in autocomplete and simple code generation. Fix: route by task complexity.

2. Redundant regeneration: The same prompt sent multiple times because the developer rejected output and retried without changing the prompt. Fix: prompt caching and better initial prompts.

3. Context bloat: Sending the entire file (or multiple files) as context when only a small section is relevant. A 5,000-token context that could be 500 tokens costs 10x more. Fix: implement context windowing.

4. Abandoned experiments: API keys from prototype integrations that are still active and accumulating charges. Fix: monthly key audit, set expiration dates.

5. Error retry storms: Automated systems that retry failed requests aggressively, burning tokens on requests that will never succeed (wrong prompt format, missing context). Fix: exponential backoff with a token budget cap per retry chain.

Step 4: Set Budget Alerts and Hard Limits

Alerts prevent surprises. Hard limits prevent disasters. Configure both:

Soft alerts (warnings): Notify a Slack channel when daily spend exceeds 120% of the trailing 7-day average, or when a single developer exceeds their weekly allocation. These catch anomalies early without blocking work.

Hard limits (cutoffs): Automatically disable API keys or reject requests when a project exceeds its monthly budget. Essential for CI/CD pipelines where runaway loops can burn thousands in minutes. OpenRouter, Portkey, and Helicone all support per-key spending limits.

A practical alert structure for a team spending $3,000/month across Claude and GPT models:

Daily alert: > $150/day (50% over expected $100/day average)
Per-developer alert: > $30/day (flags unusual individual usage)
Hard limit: $4,000/month total (33% buffer above expected)
CI pipeline hard limit: $500/month (prevents automated runaway)

Step 5: Monthly Audit Cadence

Set a monthly review to catch drift. The audit takes 30 minutes and covers:

Total spend vs. budget: are you on track?
Model mix: could cheaper models handle any current Opus/GPT-5.5 traffic?
Top 5 callers: is any single integration or developer disproportionately expensive?
Rejection rate: what percentage of generated code gets immediately discarded?
Active keys: are all API keys still needed? Revoke any unused for 30+ days.

Tool Comparison for Token Audit

Tool	Setup Effort	Multi-Provider	Budget Alerts	Cost
OpenRouter	Minimal (use as gateway)	Yes (all models)	Per-key limits	Token markup
Helicone	Low (URL change)	Yes	Yes	Free tier + paid
Portkey	Low (SDK wrapper)	Yes	Granular policies	Platform fee
Custom logging	High (build it)	Yes	Custom	Infra cost only

Frequently Asked Questions

What is the easiest way to track LLM token spend across Claude, GPT, and other providers?

Use a proxy-based tool like Helicone that logs requests to any provider with a one-line URL change. It calculates costs automatically and provides a unified dashboard. If you already use OpenRouter as a gateway, its built-in analytics cover all models routed through it.

How do I set budget alerts to prevent unexpected LLM costs?

OpenRouter, Portkey, and Helicone all support per-key spending limits. Set a daily alert at 150% of your average daily spend and a hard monthly limit at 130% of your budget. For CI pipelines, set a separate lower limit to prevent automated runaway costs.

What are the most common sources of wasted LLM tokens in coding teams?

Model overqualification (using expensive models for simple tasks), redundant regeneration (retrying without changing prompts), context bloat (sending full files when snippets suffice), abandoned API keys, and error retry storms in automated pipelines.

How often should teams audit their LLM token spending?

Monthly is the recommended cadence. A 30-minute review covering total spend vs. budget, model mix optimization, top callers, rejection rates, and unused API keys is sufficient to catch drift and waste before it accumulates.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

What Is a Token in LLM? A Complete Guide for Developers

Understand how LLMs process text through tokens — the fundamental billing unit behind every ChatGPT, Claude, and Gemini API call. Learn how tokenization works and why it matters for your AI coding costs.

OpenRouter vs Portkey: Which LLM Gateway Cuts AI Coding Costs More in 2026?

A detailed comparison of OpenRouter and Portkey as LLM gateways for AI coding teams. Covers routing strategies, cost optimization, latency, compliance, and when to choose each platform.

OpenRouter's Official Comparison With LiteLLM: Self-Hosted vs Managed LLM Gateway Costs

OpenRouter published a direct comparison with self-hosted LiteLLM. We break down the real infrastructure costs, maintenance burden, and latency tradeoffs to help developers choose the right LLM gateway for their AI coding stack.

← Previous

How to Choose Between Managed and Self-Hosted LLM Inference for Coding Agents

Claude Code vs Cursor vs Copilot Workspace: AI Coding Agent Collaboration Features and Cost in 2026