How Many Tokens Does an AI Coding Agent Use Per Session? Real Data Breakdown

By Eric Bush · June 18, 2026 · 8 min read

Abstract digital data stream with flowing lines of light representing token processing

Why Token Usage Matters More Than You Think

Most developers starting with AI coding agents focus on the per-token price — $3/M input for Sonnet 4.6, $5/M for Opus 4.8. But the real cost driver is how many tokens each session actually consumes. A 30-minute Claude Code session can easily burn through 50,000 to 150,000 input tokens and 10,000 to 30,000 output tokens. That range means your actual cost per session can vary by 3x or more depending on how you work.

Understanding where those tokens go — and why input tokens so heavily outweigh output — is the key to controlling AI coding costs. This breakdown uses real session data to show exactly what happens inside a typical coding agent interaction.

The pattern is consistent across all token-based agents: input tokens grow with every turn, while output tokens stay relatively constant. This compounding effect is why a 20-turn session doesn't cost 2x a 10-turn session — it can cost 3-4x.

Anatomy of Token Consumption: Where Do Tokens Go?

Every API call to an AI coding agent includes several components that all count as input tokens. The system prompt alone — the instructions that tell the model how to behave as a coding agent — typically consumes 2,000 to 8,000 tokens. This is sent with every single turn.

Next comes codebase context: files the agent reads, search results, directory listings, and tool outputs. When Claude Code reads a 500-line file, that's roughly 3,000-5,000 tokens added to the context. A session that touches 10 files might load 30,000-50,000 tokens of code context alone.

Then there's the conversation history — and this is where compounding kicks in. Every previous message (both your prompts and the agent's responses) gets re-sent as input on each new turn. Turn 1 might send 10K tokens total. Turn 10 sends all of turns 1-9 plus the new message. Turn 20 includes the entire conversation so far.

The formula for total input tokens in a session is approximately: Total Input = Σ(base_context + conversation_history_at_turn_n) for each turn n. With a base context of 10K tokens and average growth of 5K per turn, a 20-turn session accumulates roughly: 20 × 10K + (5K × 20 × 21 / 2) = 200K + 1,050K = 1.25M total input tokens processed across all API calls.

Real Session Data: Token Usage by Agent Type

Different agent architectures consume tokens at very different rates. Here's what real-world usage looks like across the three main categories:

Chat-based assistants (ChatGPT, Claude.ai web) use the fewest tokens per session. A typical 10-turn conversation consumes 30K-80K input tokens and 5K-15K output tokens. Context is limited to what you paste in, and the model doesn't read files autonomously. Cost per session with Sonnet 4.6: approximately $0.10-$0.30.

CLI agents (Claude Code, Aider, Codex CLI) sit in the middle tier. A 30-minute session with 15-20 turns typically consumes 50K-200K input tokens and 10K-40K output tokens. These agents read files, run commands, and maintain tool-call history — all of which adds to context. Cost per session with Sonnet 4.6: approximately $0.30-$1.20. With Opus 4.8: $0.50-$2.00.

Autonomous sandbox agents (Devin, OpenAI Codex agent mode) consume the most. These agents run multi-step plans autonomously, often making 30-100+ tool calls per task. A single task can consume 200K-800K input tokens and 30K-100K output tokens. The agent loops through read-edit-test cycles without human intervention, accumulating massive context. Cost per task with frontier models: $2-$15+.

The cost difference between DeepSeek V4 Flash at $0.10/$0.20 per million tokens and Claude Opus 4.8 at $5/$25 makes the same 500K-token session cost either $0.15 or $5.00 — a 33x difference for the same token consumption.

The Compounding Problem: Why Long Sessions Get Expensive

Here's a turn-by-turn breakdown of a real Claude Code session working on a React component refactor, using Sonnet 4.6 ($3/$15 per million tokens):

Turns 1-5: Base context (8K) + growing history. Cumulative input: ~60K tokens. Cost so far: $0.18 input + $0.15 output = $0.33. Turns 6-10: Context now includes file reads and prior edits. Cumulative input: ~180K tokens. Cost so far: $0.54 + $0.30 = $0.84. Turns 11-15: History dominates the context window. Cumulative input: ~380K tokens. Cost so far: $1.14 + $0.45 = $1.59. Turns 16-20: Approaching context limits, possible compaction. Cumulative input: ~650K tokens. Cost so far: $1.95 + $0.60 = $2.55.

Notice the pattern: the last 5 turns cost more than the first 10 combined. This is quadratic growth in action. The practical implication is clear — shorter, focused sessions are dramatically cheaper than long exploratory ones.

Some agents implement context compaction — summarizing older conversation history to reduce token count. Claude Code, for instance, automatically compacts when approaching context limits. This can reduce costs by 30-50% on very long sessions, but the summarization itself costs tokens and may lose important details.

Practical Strategies to Reduce Token Consumption

Start new sessions frequently. Instead of one 30-turn session, break work into three 10-turn sessions. You lose some context but avoid the quadratic cost growth. This alone can cut costs by 40-60%.

Be specific in your prompts. Vague requests like "fix this file" force the agent to read more code and make more attempts. Specific requests like "fix the null check on line 42 of auth.ts" require less context and fewer turns.

Choose the right model for the task. Use Opus 4.8 or GPT-5.5 only for complex reasoning tasks. For straightforward edits, Sonnet 4.6 or even DeepSeek V4 Flash handles the work at a fraction of the cost. A bug fix that takes Opus 3 turns might take DeepSeek 5 turns — but at $0.10/$0.20 vs $5/$25, the cheaper model wins on total cost even with more turns.

Use the prediction formula for budgeting: Expected cost = (turns × average_tokens_per_turn × price_per_token) × 1.5 (for the compounding overhead). For a 15-turn Sonnet session: 15 × 15K × $3/M × 1.5 = $1.01 input, plus output at roughly 15 × 2K × $15/M = $0.45. Total: approximately $1.46.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

Frequently Asked Questions

How many tokens does a typical Claude Code session use?

A typical 30-minute Claude Code session with 15-20 turns consumes 50,000-200,000 input tokens and 10,000-40,000 output tokens. The exact amount depends on how many files are read, the length of your prompts, and the complexity of the agent's responses.

Why are input tokens so much higher than output tokens?

Input tokens include the system prompt, all files the agent reads, tool call results, AND the entire conversation history re-sent every turn. Output tokens are just the agent's new response each turn. The conversation history compounds with each turn, causing input to grow quadratically.

How much does a single AI coding session cost?

With Claude Sonnet 4.6, a typical 15-20 turn session costs $0.50-$2.50. With Opus 4.8, the same session costs $1.50-$7.00. With budget models like DeepSeek V4 Flash, it's $0.05-$0.20. The model choice matters more than session length for total cost.

Do longer sessions always cost more per turn?

Yes. Due to conversation history compounding, each successive turn costs more than the previous one because more context must be re-sent. Turn 20 in a session sends all of turns 1-19 as input. Breaking work into shorter sessions avoids this compounding.

What's the cheapest way to use AI coding agents?

Use budget models (DeepSeek V4 Flash at $0.10/$0.20, Qwen3 30B at $0.08/$0.28) for routine tasks, keep sessions short (under 10 turns), be specific in prompts to reduce unnecessary file reads, and reserve expensive frontier models only for complex reasoning tasks.

Mindwalk Maps Your Agent's Session in 3D — And Shows Where Tokens Get Wasted

Mindwalk replays Claude Code and Codex sessions on a 3D map of your codebase, lighting up files the agent touched. Here's how visualizing agent behavior helps you spot and cut wasted token spend.

Why OpenAI Codex Now Drives 99.8% of Internal Token Output: Lessons for Your Own AI Coding Bill

OpenAI's internal report on June 27, 2026 disclosed that Codex now generates 99.8% of the company's internal token output — up from less than 10% a year ago. 80.6% of users launch tasks longer than 30 minutes. We work through the cost implications and what your own team can learn from how OpenAI runs Codex internally.

The System Prompt Tax: How Much You're Paying for Instructions in Every AI Coding Session

System prompts get charged as input tokens on every API call. For coding agents with detailed instructions, that hidden cost can represent 20–40% of your total bill. Here's how to measure and reduce it.

← Previous

Best AI Model for Coding by Task Type: Cost vs Quality Guide (2026)

GitHub's AI Capacity Crunch: Microsoft Turns to AWS as Copilot Hits Infrastructure Limits