← Back to Blog

How to Audit Your LLM Gateway: Tracking Token Spend Across Multiple Providers

June 22, 2026 · 7 min read

Analytics dashboard showing data charts and spending metrics

The Multi-Provider Visibility Problem

Most teams using AI coding agents end up with token spend distributed across multiple providers: Claude via Anthropic's API for complex reasoning, GPT-5.4 Mini for high-volume autocomplete, DeepSeek V4 Flash for batch processing. Each provider has its own dashboard, its own billing cycle, and its own way of reporting usage.

Without unified visibility, waste hides in the gaps. A developer might be routing complex tasks to GPT-5.5 ($5/$30 per M) when Claude Sonnet 4.6 ($3/$15 per M) would produce equivalent results. A CI pipeline might be running redundant LLM calls that could be cached. A forgotten experimental integration might be burning $200/month unnoticed.

Step 1: Centralize Your Token Accounting

Before you can optimize, you need a single view of all LLM spend. Three approaches, from simplest to most flexible:

Gateway-native analytics (easiest): If you already route through OpenRouter or Portkey, their dashboards show per-model, per-key, and per-request cost breakdowns. OpenRouter's activity page shows real-time spend per model with daily/weekly/monthly aggregation. This requires zero setup beyond using the gateway.

Proxy-based observability (Helicone, LangSmith): These tools act as lightweight proxies — you change your base URL, and they log every request with full token counts and cost calculations. Helicone supports Anthropic, OpenAI, and other providers simultaneously, giving you a unified dashboard even without a gateway. Setup is one line of code per provider.

Custom logging (most control): Wrap your LLM client with a logging layer that records model, token counts, estimated cost, caller identity, and timestamp to your own database. This gives full control over data retention and custom reporting, but requires engineering time to build and maintain.

Step 2: Tag Requests by Purpose

Raw token counts are not actionable without context. Tag every LLM request with metadata that helps you understand where money goes:

  • Purpose: autocomplete, code-review, refactoring, test-generation, debugging
  • Caller: which developer, CI pipeline, or automated system made the request
  • Project: which repository or service the request relates to
  • Outcome: did the response get accepted, rejected, or regenerated

OpenRouter supports custom headers for metadata tagging. Helicone uses a Helicone-Property-* header pattern. For custom solutions, add a wrapper function that attaches metadata to every call.

Step 3: Identify the Five Common Waste Patterns

Once you have tagged spend data, look for these patterns that consistently appear in team audits:

1. Model overqualification: Using Claude Opus 4.8 ($5/$25 per M) for tasks that Claude Sonnet 4.6 ($3/$15 per M) handles equally well. Common in autocomplete and simple code generation. Fix: route by task complexity.

2. Redundant regeneration: The same prompt sent multiple times because the developer rejected output and retried without changing the prompt. Fix: prompt caching and better initial prompts.

3. Context bloat: Sending the entire file (or multiple files) as context when only a small section is relevant. A 5,000-token context that could be 500 tokens costs 10x more. Fix: implement context windowing.

4. Abandoned experiments: API keys from prototype integrations that are still active and accumulating charges. Fix: monthly key audit, set expiration dates.

5. Error retry storms: Automated systems that retry failed requests aggressively, burning tokens on requests that will never succeed (wrong prompt format, missing context). Fix: exponential backoff with a token budget cap per retry chain.

Step 4: Set Budget Alerts and Hard Limits

Alerts prevent surprises. Hard limits prevent disasters. Configure both:

Soft alerts (warnings): Notify a Slack channel when daily spend exceeds 120% of the trailing 7-day average, or when a single developer exceeds their weekly allocation. These catch anomalies early without blocking work.

Hard limits (cutoffs): Automatically disable API keys or reject requests when a project exceeds its monthly budget. Essential for CI/CD pipelines where runaway loops can burn thousands in minutes. OpenRouter, Portkey, and Helicone all support per-key spending limits.

A practical alert structure for a team spending $3,000/month across Claude and GPT models:

  • Daily alert: > $150/day (50% over expected $100/day average)
  • Per-developer alert: > $30/day (flags unusual individual usage)
  • Hard limit: $4,000/month total (33% buffer above expected)
  • CI pipeline hard limit: $500/month (prevents automated runaway)

Step 5: Monthly Audit Cadence

Set a monthly review to catch drift. The audit takes 30 minutes and covers:

  • Total spend vs. budget: are you on track?
  • Model mix: could cheaper models handle any current Opus/GPT-5.5 traffic?
  • Top 5 callers: is any single integration or developer disproportionately expensive?
  • Rejection rate: what percentage of generated code gets immediately discarded?
  • Active keys: are all API keys still needed? Revoke any unused for 30+ days.

Tool Comparison for Token Audit

Tool Setup Effort Multi-Provider Budget Alerts Cost
OpenRouterMinimal (use as gateway)Yes (all models)Per-key limitsToken markup
HeliconeLow (URL change)YesYesFree tier + paid
PortkeyLow (SDK wrapper)YesGranular policiesPlatform fee
Custom loggingHigh (build it)YesCustomInfra cost only

Frequently Asked Questions

What is the easiest way to track LLM token spend across Claude, GPT, and other providers?

Use a proxy-based tool like Helicone that logs requests to any provider with a one-line URL change. It calculates costs automatically and provides a unified dashboard. If you already use OpenRouter as a gateway, its built-in analytics cover all models routed through it.

How do I set budget alerts to prevent unexpected LLM costs?

OpenRouter, Portkey, and Helicone all support per-key spending limits. Set a daily alert at 150% of your average daily spend and a hard monthly limit at 130% of your budget. For CI pipelines, set a separate lower limit to prevent automated runaway costs.

What are the most common sources of wasted LLM tokens in coding teams?

Model overqualification (using expensive models for simple tasks), redundant regeneration (retrying without changing prompts), context bloat (sending full files when snippets suffice), abandoned API keys, and error retry storms in automated pipelines.

How often should teams audit their LLM token spending?

Monthly is the recommended cadence. A 30-minute review covering total spend vs. budget, model mix optimization, top callers, rejection rates, and unused API keys is sufficient to catch drift and waste before it accumulates.

Want to calculate exact costs for your project?