The Token Cost of AI Agent Failed Runs: How Much You're Really Paying for Retries and Rollbacks

June 26, 2026 · 9 min read

Stack of coins toppling representing wasted resources

The Invisible Line Item

Open your AI coding agent's monthly token report and look at the total. Whatever you see, somewhere between 10% and 35% of that number is what we'll call the failed-run tax — tokens spent on agent attempts that didn't reach completion and were retried from a clean state.

The exact share depends on the agent, the task complexity, and your orchestration setup. But it's almost always real, almost always under-measured, and almost always cheaper to reduce than to absorb.

Why Agents Fail Mid-Run

Modern coding agents are multi-step workflows: plan → read files → search codebase → make changes → run tests → review → deploy. Each step has its own failure modes:

Context overflow: the agent's working context grows past its window, breaking the task.
Tool errors: a file edit conflicts, a command times out, a test runner crashes.
Rate limit hits: the upstream LLM API throttles mid-task.
Logical dead ends: the agent gets stuck in a loop or pursues a wrong approach until it gives up.
User interruption: the developer kills the agent because it's going wrong.

Each of these triggers a retry or restart. By default, the retry begins from scratch — full plan tokens, full file reads, full reasoning. The tokens spent on the first attempt are gone.

The Math at Production Scale

Take a representative agent workflow:

Average task: 5 steps (plan, read, write, test, deploy)
Average tokens per successful task: 35,000 input + 8,000 output
Per-step failure probability: 4% (compound to ~18% task-level failure rate)
Failed tasks restart from scratch under default behavior

Expected tokens per user-facing task: success path (82%) at 43K tokens + failure path (18%) at 43K (failed) + 43K (retry) = 1.18 × 43K = ~50.7K average tokens per task. The 18% failure rate inflates the average cost by 18%.

For a team running 10,000 agent tasks/month at Claude Sonnet rates ($3 input + $15 output per million): success-only would cost $1,650/month. With the failed-run tax: $1,947/month. $297/month, or 18%, is the failed-run tax.

How to Measure Your Own Failed-Run Tax

Most teams don't measure this explicitly. The metrics to instrument:

Task-level failure rate. What fraction of user-facing agent tasks succeed on the first attempt vs. require a retry? Tag each attempt with a task_id; count distinct task_ids vs. total attempts.

Tokens per attempt. Average tokens spent on attempts that succeeded vs. attempts that failed. Failed attempts typically consume 60-90% of a successful run's token count before bailing out.

Retry depth. How many attempts on average does each task need? Most stop at 1 retry; some run 3-5 retries before being abandoned.

Multiply: (retries per task) × (tokens per failed attempt) × (pricing) = your monthly failed-run tax.

Three Patterns That Reduce the Tax

The failed-run tax is reducible. Three architectural patterns address it:

Pattern 1: Checkpoint-and-resume. Persist agent state at the end of each successful step. When a later step fails, the retry resumes from the last checkpoint instead of starting over. Reduces failed-run token waste from "redo everything" to "redo the failed step plus what came after." Typical savings: 40-70% of the failed-run tax.

Pattern 2: Saga rollback for side-effecting steps. Register compensation logic for each step that creates real side effects (commits, deploys, external API calls). When the workflow fails, run compensations in reverse to clean up — then resume from the last good state. Cloudflare Workflows added this pattern in June 2026. Typical savings: 60-80% of failed-run tax in workflows with significant side effects.

Pattern 3: Failure prediction and early termination. Train a small model or rules engine to detect when an agent is heading toward failure (looping, hallucinating, repeatedly hitting the same error). Terminate early and route to a different model or surface the issue to the developer. Reduces the tokens spent on doomed attempts. Typical savings: 20-40% of failed-run tax.

Implementation Effort vs. Savings

Each pattern has a different cost-benefit profile:

Checkpoint-and-resume: 1-2 engineer-weeks to implement, ~$200/month maintenance. Saves 40-70% of failed-run tax. Easy first win for any team.

Saga rollback: 2-4 engineer-weeks if your orchestration platform doesn't support it natively (Cloudflare Workflows now does; Temporal has had it for years). Saves the most in workflows with real side effects.

Failure prediction: 3-6 engineer-weeks for a working detector, plus ongoing tuning. Lower savings per task but high value when your workflow includes long-running goal-mode agents that can burn 100K+ tokens before failing.

A Quick Heuristic for Prioritizing

Roughly, prioritize the three patterns this way:

If your agent tasks are short (under 10 steps): checkpoint-and-resume first.
If your tasks have side effects (commits, deploys, external API calls): saga rollback first.
If your tasks are long (goal-mode agents, multi-hour runs): failure prediction first.

Most production teams end up implementing all three over time. The order depends on which dominates the failure modes in their workflow.

The Larger Lesson

The failed-run tax is one of the most under-measured line items in AI coding budgets. Teams measure their monthly token spend, but they don't measure how much of that spend is on attempts that didn't deliver value. Once you start measuring, the optimization paths become clear, and the engineering investment to reduce the tax usually pays back in 3-6 months.

In an ecosystem where output token prices keep dropping but agent workflows keep getting longer and more complex, the marginal value of cutting the failed-run tax keeps rising. The teams that win on AI coding economics in 2027 will be the ones that measured and optimized this in 2026.

Bottom Line

Somewhere between 10% and 35% of your AI coding token bill is being spent on failed runs that retry from scratch. Three patterns — checkpoint-and-resume, saga rollback, and failure prediction — collectively cut the failed-run tax by 60-90%. Start measuring; the optimization paths become obvious once you can see them.

Frequently Asked Questions

What is the 'failed-run tax' in AI coding bills?

It's the share of your monthly AI coding token spend consumed by agent attempts that didn't reach completion and were retried from a clean state. The retry typically starts over from scratch — full plan tokens, file reads, reasoning — meaning the tokens spent on the failed attempt don't deliver value. The tax is usually 10-35% of total spend depending on agent type and workflow complexity.

How can I measure my own failed-run tax?

Three metrics to instrument: task-level failure rate (what fraction of user-facing tasks succeed on first attempt vs. need a retry?), tokens-per-failed-attempt (averaged across failed runs), and retry depth (how many attempts on average per task). Multiply these by your per-token pricing to get the monthly failed-run tax in dollars.

What's the easiest pattern to reduce AI agent failed-run waste?

Checkpoint-and-resume. Persist agent state at the end of each successful step. When a later step fails, the retry resumes from the last checkpoint instead of starting over. Implementation cost is 1-2 engineer-weeks and savings are typically 40-70% of the failed-run tax — the highest return per engineer-hour of the three main patterns.

When should I use saga rollback for AI coding agents?

Saga rollback (registering compensation logic for each step with side effects) is most valuable when your agent tasks create real, expensive-to-undo side effects: git commits, deploys, external API calls, database writes. Cloudflare Workflows added this in June 2026; Temporal has supported it for years. Typical savings: 60-80% of failed-run tax in side-effect-heavy workflows.

How quickly does investment in reducing the failed-run tax pay back?

For mid-sized teams running 5,000-15,000 agent tasks/month at typical token rates, the engineering investment in checkpoint-and-resume or saga rollback usually pays back in 3-6 months on token savings alone. Failure prediction has a longer payback window because the engineering cost is higher and savings per task are smaller, but it's especially valuable for long-running goal-mode agents.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

The System Prompt Tax: How Much You're Paying for Instructions in Every AI Coding Session

System prompts get charged as input tokens on every API call. For coding agents with detailed instructions, that hidden cost can represent 20–40% of your total bill. Here's how to measure and reduce it.

Cloudflare Workflows Saga Rollbacks: How Compensation Logic Cuts AI Agent Failed-Run Token Waste

Cloudflare Workflows just added saga-pattern rollbacks: inline compensation logic for every step.do() call. We explain why the saga pattern matters for AI coding agents that fail mid-run, and how it changes the math on the hidden token cost of agent retries.

What Is Predict-Then-Act Agent Architecture? How It Reduces Rollback Token Cost

Predict-then-act is the architecture behind Qwen-AgentWorld's June 2026 release. We explain what it means, why it cuts wasted agent tokens by 25-40%, and where it falls short.

← Previous

The 2026 Open-Source SWE-Bench Frontier: TCO Math for Self-Hosting Top Coding Models

Running 3 AI Agents on 1 GPU: The Real Cost Math for Self-Hosted Multi-Agent Coding