How to Read Your AI API Bill: A Line-by-Line Breakdown

By Eric Bush · May 31, 2026 · 8 min read

Why AI Bills Are Hard to Read

A traditional SaaS bill is simple: $X per seat, $Y per month. An AI API bill is a consumption statement with multiple line items, each representing a different type of token usage across potentially dozens of models. If you have ever opened your Anthropic or OpenAI invoice and felt confused, you are not alone. This guide walks through every component so you know exactly what you are paying for.

The Core Components: Input vs Output Tokens

Every AI API bill starts with the same fundamental split: input tokens and output tokens. These are priced differently because they have different computational costs.

Input tokens are everything you send to the model: your system prompt, conversation history, the user's message, any documents or code you include as context. Input processing is computationally cheaper because the model processes all tokens in parallel.
Output tokens are everything the model generates in response. Output is more expensive because tokens are generated sequentially — each token depends on all previous tokens. This sequential dependency is why output costs 3–10x more than input per token.

Model	Input (per 1M)	Output (per 1M)	Output/Input Ratio
Claude Opus 4.7	$5.00	$25.00	5x
Claude Sonnet 4.6	$3.00	$15.00	5x
Claude Haiku 4.5	$1.00	$5.00	5x
GPT-5.5	$5.00	$30.00	6x
DeepSeek V4 Flash	$0.14	$0.28	2x

Cache Read Tokens: The Discount Line Item

If you use prompt caching, your bill will include a third line item: cache read tokens. These are input tokens that were served from cache rather than processed fresh. Cache reads are priced at roughly 10% of standard input token prices.

On Anthropic's billing, you will see three input-related charges:

Input tokens (standard): Fresh tokens processed at full price
Cache write tokens: Tokens written to cache (priced at 125% of standard input — a one-time cost to create the cache)
Cache read tokens: Tokens served from cache (priced at ~10% of standard input)

If your bill shows a large cache read line item, that is good news — it means your caching is working and you are saving money. If you have no cache read tokens despite sending the same system prompt repeatedly, you are leaving significant savings on the table.

Reasoning Tokens: The Hidden Cost of Thinking Models

Models with extended thinking or reasoning capabilities — like Claude's extended thinking mode or OpenAI's o-series — generate internal reasoning tokens before producing their final response. These reasoning tokens are billed as output tokens even though you never see them.

A single reasoning request can generate 5,000–20,000 reasoning tokens before producing a 500-token response. At Claude Sonnet 4.6 output pricing of $15 per million tokens, that is $0.075–$0.30 in reasoning tokens alone per request. For complex tasks where reasoning is valuable, this is worth it. For simple tasks, it is waste.

Check your bill for a "reasoning tokens" or "thinking tokens" line item. If it is large relative to your output tokens, consider whether you are using extended thinking for tasks that do not require it.

Batch API Discounts

Both Anthropic and OpenAI offer batch processing APIs that process requests asynchronously (typically within 24 hours) at 50% of standard pricing. If your bill shows batch API line items, those are your cheapest tokens. If you have tasks that do not require real-time responses — documentation generation, test writing, code analysis — and you are not using batch processing, you are overpaying by 2x.

Reading the Usage Dashboard

Beyond the invoice, both Anthropic and OpenAI provide usage dashboards that break down consumption by model, date, and API key. The most useful views:

Cost by model: Identifies which models are driving your bill. If Opus is your top cost driver but most of your tasks are routine, you have a model selection problem.
Cost by day: Identifies usage spikes. A day with 10x normal spending usually indicates a runaway agent loop or an accidental large context submission.
Input vs output ratio: A healthy ratio for most coding tasks is 3:1 to 5:1 input to output. If your output tokens exceed your input tokens, you may have prompts that are generating unnecessarily verbose responses.

The Three Questions to Ask About Every Bill

When reviewing your AI API bill, three questions identify the biggest optimization opportunities:

Am I using the right model for each task? If Opus is your top cost driver, audit whether those tasks actually require Opus-level capability.
Am I using prompt caching? If you have no cache read tokens, you are paying full price for repeated context.
Do I have any runaway processes? A single day with 10x normal spending is a red flag for an agent loop or misconfigured automation.

Use the AI Cost Estimator to model what your workload should cost across different models, and compare that against your actual bill to identify where you are overspending.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

How to Read Your AI API Invoice: A Line-by-Line Guide for Developers

AI API invoices are full of abbreviations and line items that are easy to misread. This guide walks through every charge you'll see from OpenAI, Anthropic, and Google — and what each one actually means for your costs.

Memory Prices Surging 40–50% in Q3 2026: Samsung + SK Hynix's $590B Bet and Your AI Coding API Bill

Jefferies forecasts DRAM and HBM prices rising 40–50% in Q3 2026 alone, with two suppliers controlling 80% of HBM. We trace how that $590B Korean capex push lands in Claude, GPT, and Gemini token pricing.

DFlash Block-Diffusion Drafts Hit 15× Throughput: When Speculative Decoding Cuts Your Coding API Bill

DFlash uses block-diffusion drafts in speculative decoding for up to 15× throughput on NVIDIA hardware. We walk through how draft-model architectures translate into developer-facing token-price drops with rough math.

← Previous

The Real Cost of AI Code Review: Token Usage Patterns Across PR Sizes

AI Coding Cost by Team Size: Solo Dev vs Startup vs Enterprise