AI Cost Estimator

Estimate your AI coding costs

← Back to Blog

How to Read Your AI API Invoice: A Line-by-Line Guide for Developers

May 26, 2026 · 7 min read

Why AI Invoices Are Confusing

Unlike a SaaS subscription with a fixed monthly fee, AI API invoices are consumption-based with multiple pricing dimensions: input tokens, output tokens, cached reads, reasoning tokens, and sometimes compute add-ons. First-time recipients often stare at the total and have no idea which feature or workflow drove the largest charges.

This guide covers the standard line items you will see across OpenAI, Anthropic, and Google — the three providers where most developers spend the bulk of their AI budget.

The Core Line Items: Input and Output Tokens

Every AI API invoice starts with two fundamental charges:

  • Input tokens: Everything you send to the model — system prompt, conversation history, tool definitions, code files, and the current user message. Charged at the input rate.
  • Output tokens: Everything the model generates in response. Always more expensive than input tokens, typically 3–6x higher per token.
Provider / Model Input (per 1M) Output (per 1M) Output Multiplier
Claude Sonnet 4.6 $3.00 $15.00 5x
GPT-5.5 $5.00 $30.00 6x
GPT-4.1 $2.00 $8.00 4x
Gemini 3.5 Flash $1.50 $9.00 6x
DeepSeek V4 Flash $0.112 $0.224 2x

The output multiplier is why long-form code generation costs more than you expect. Generating 10,000 lines of code output uses far more budget than reading 10,000 lines of existing code context.

Cache Read Tokens

When you see a line item labeled "cache read tokens" or "cached input tokens," these are input tokens that were served from the provider's cache rather than reprocessed from scratch. Cache reads are significantly cheaper — typically 75–90% discount versus standard input pricing.

A large cache read volume on your invoice is good news: it means your prompt caching is working. If you have a high-volume application and see very few cache reads, you likely have a caching configuration issue worth investigating.

Some invoices also show a "cache write" line item. This is the one-time cost to store content in the cache. On Anthropic, cache write tokens cost 25% more than standard input tokens — the overhead of creating the cache entry, not reading from it.

Reasoning Tokens

Models with extended thinking (Claude's extended thinking mode, OpenAI's o-series reasoning models) generate internal chain-of-thought tokens before producing their response. These reasoning tokens are charged as output tokens but do not appear in the final response — they are invisible to you but visible on your invoice.

This is the most common source of invoice surprise for developers who enable extended thinking expecting better results without expecting a 3–10x increase in output token consumption. A response that reads as 500 output tokens may have required 5,000 reasoning tokens to produce.

Always set a budget_tokens limit on reasoning calls to cap this cost. If you do not see a budget limit in your configuration and extended thinking is enabled, you are potentially paying for unbounded reasoning on every call.

Batch API Discounts

OpenAI and Anthropic both offer batch API pricing — typically 50% off standard rates — for requests that do not need to complete within a few seconds. On your invoice, batch tokens appear as a separate line item with the discounted rate applied automatically.

If you are running asynchronous workflows — automated test generation, documentation updates, code review pipelines that can run overnight — and you do not see any batch API charges, you are paying double the necessary rate for those workloads.

Reading the Breakdown: A Sample Invoice

A typical monthly invoice for a mid-sized development team using Claude Sonnet 4.6 might look like:

Line Item Volume Rate Charge
Input tokens 800M $3.00/M $2,400
Cache read tokens 1.2B $0.30/M $360
Cache write tokens 50M $3.75/M $188
Output tokens 200M $15.00/M $3,000
Total $5,948

In this example, output tokens account for 50% of the bill despite being only 10% of total token volume. The large cache read volume (1.2B tokens) represents effective caching — without it, that volume would be billed at $3.00/M instead of $0.30/M, adding $3,240 to the invoice.

The Bottom Line

Once you can read your AI API invoice fluently, the optimization opportunities become obvious: enable caching to convert expensive input tokens to cheap cache reads, use batch APIs for async workloads, and cap reasoning token budgets to prevent runaway thinking costs.

Use the AI Cost Estimator to project what your invoice will look like as you scale your usage — before you get the actual bill.

Want to calculate exact costs for your project?