How to Read Your AI API Bill: A Line-by-Line Breakdown
May 31, 2026 · 8 min read
Why AI Bills Are Hard to Read
A traditional SaaS bill is simple: $X per seat, $Y per month. An AI API bill is a consumption statement with multiple line items, each representing a different type of token usage across potentially dozens of models. If you have ever opened your Anthropic or OpenAI invoice and felt confused, you are not alone. This guide walks through every component so you know exactly what you are paying for.
The Core Components: Input vs Output Tokens
Every AI API bill starts with the same fundamental split: input tokens and output tokens. These are priced differently because they have different computational costs.
- Input tokens are everything you send to the model: your system prompt, conversation history, the user's message, any documents or code you include as context. Input processing is computationally cheaper because the model processes all tokens in parallel.
- Output tokens are everything the model generates in response. Output is more expensive because tokens are generated sequentially — each token depends on all previous tokens. This sequential dependency is why output costs 3–10x more than input per token.
| Model | Input (per 1M) | Output (per 1M) | Output/Input Ratio |
|---|---|---|---|
| Claude Opus 4.7 | $5.00 | $25.00 | 5x |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 5x |
| Claude Haiku 4.5 | $1.00 | $5.00 | 5x |
| GPT-5.5 | $5.00 | $30.00 | 6x |
| DeepSeek V4 Flash | $0.14 | $0.28 | 2x |
Cache Read Tokens: The Discount Line Item
If you use prompt caching, your bill will include a third line item: cache read tokens. These are input tokens that were served from cache rather than processed fresh. Cache reads are priced at roughly 10% of standard input token prices.
On Anthropic's billing, you will see three input-related charges:
- Input tokens (standard): Fresh tokens processed at full price
- Cache write tokens: Tokens written to cache (priced at 125% of standard input — a one-time cost to create the cache)
- Cache read tokens: Tokens served from cache (priced at ~10% of standard input)
If your bill shows a large cache read line item, that is good news — it means your caching is working and you are saving money. If you have no cache read tokens despite sending the same system prompt repeatedly, you are leaving significant savings on the table.
Reasoning Tokens: The Hidden Cost of Thinking Models
Models with extended thinking or reasoning capabilities — like Claude's extended thinking mode or OpenAI's o-series — generate internal reasoning tokens before producing their final response. These reasoning tokens are billed as output tokens even though you never see them.
A single reasoning request can generate 5,000–20,000 reasoning tokens before producing a 500-token response. At Claude Sonnet 4.6 output pricing of $15 per million tokens, that is $0.075–$0.30 in reasoning tokens alone per request. For complex tasks where reasoning is valuable, this is worth it. For simple tasks, it is waste.
Check your bill for a "reasoning tokens" or "thinking tokens" line item. If it is large relative to your output tokens, consider whether you are using extended thinking for tasks that do not require it.
Batch API Discounts
Both Anthropic and OpenAI offer batch processing APIs that process requests asynchronously (typically within 24 hours) at 50% of standard pricing. If your bill shows batch API line items, those are your cheapest tokens. If you have tasks that do not require real-time responses — documentation generation, test writing, code analysis — and you are not using batch processing, you are overpaying by 2x.
Reading the Usage Dashboard
Beyond the invoice, both Anthropic and OpenAI provide usage dashboards that break down consumption by model, date, and API key. The most useful views:
- Cost by model: Identifies which models are driving your bill. If Opus is your top cost driver but most of your tasks are routine, you have a model selection problem.
- Cost by day: Identifies usage spikes. A day with 10x normal spending usually indicates a runaway agent loop or an accidental large context submission.
- Input vs output ratio: A healthy ratio for most coding tasks is 3:1 to 5:1 input to output. If your output tokens exceed your input tokens, you may have prompts that are generating unnecessarily verbose responses.
The Three Questions to Ask About Every Bill
When reviewing your AI API bill, three questions identify the biggest optimization opportunities:
- Am I using the right model for each task? If Opus is your top cost driver, audit whether those tasks actually require Opus-level capability.
- Am I using prompt caching? If you have no cache read tokens, you are paying full price for repeated context.
- Do I have any runaway processes? A single day with 10x normal spending is a red flag for an agent loop or misconfigured automation.
Use the AI Cost Estimator to model what your workload should cost across different models, and compare that against your actual bill to identify where you are overspending.
Want to calculate exact costs for your project?
Related Articles
How to Read Your AI API Invoice: A Line-by-Line Guide for Developers
AI API invoices are full of abbreviations and line items that are easy to misread. This guide walks through every charge you'll see from OpenAI, Anthropic, and Google — and what each one actually means for your costs.
API vs Open Weights: Cost Breakdown
Compare the true costs of using API-hosted LLMs versus running open-weight models yourself. Includes break-even analysis, GPU rental costs, and hidden expenses of self-hosting.
Anthropic Surpasses OpenAI at $965B Valuation: What It Means for Claude API Pricing
Anthropic's latest funding round values it at $965B, overtaking OpenAI for the first time. We analyze what this capital infusion means for Claude API pricing, model roadmap, and developer costs.