How to Calculate Cost per AI Agent Task: A Practical Formula for Developers

By Eric Bush · May 21, 2026 · 6 min read

Cost per Prompt Is the Wrong Metric

Developers often ask how much an AI coding prompt costs. That number is useful, but it is not the metric that matters. The better metric is cost per completed agent task: the total model spend required to finish a bug fix, generate tests, migrate a component, review a pull request, or ship a feature.

A cheap prompt can become expensive if it leads to five retries. An expensive premium-model prompt can be cheap if it solves a difficult task in one pass. Task-level accounting is the only fair way to compare models, tools, and workflows.

The Basic Formula

The direct token cost of an AI agent task is straightforward:

Task cost = input tokens × input price + output tokens × output price

Because most providers quote prices per million tokens, divide token counts by 1,000,000 before multiplying. For example, 200,000 input tokens on a $3/M input model costs $0.60. 40,000 output tokens on a $15/M output model costs $0.60. The total direct token cost is $1.20.

Cost component	What to count
Input tokens	Prompts, files, logs, diffs, tool output, history
Output tokens	Code, explanations, plans, tests, summaries
Retries	Failed attempts, test-fix loops, review changes
Parallel agents	Research, implementation, QA, security review workers

Add the Retry Multiplier

Most real coding tasks do not finish on the first turn. The agent writes a patch, runs tests, sees failures, edits again, and responds to reviewer feedback. That retry loop is why cost per task can be several times higher than cost per initial prompt.

A simple bug fix may have a retry multiplier of 1.2. A multi-file refactor may be 2.0. A migration with hidden tests may be 3.0 or higher. The multiplier should reflect your actual workflow, not an optimistic demo.

Do Not Ignore Human Review

Token cost is only part of the economics. If a cheap model produces code that takes 45 minutes to review, it may be more expensive than a premium model that produces a clean patch in 10 minutes. For teams, the best metric is often fully loaded task cost: token spend plus human review time.

You do not need a perfect accounting system. Even rough estimates help. Track agent task type, model used, number of turns, whether tests passed, and review time. Patterns appear quickly.

Bottom Line

Cost per AI agent task is the right way to compare coding models. Count input tokens, output tokens, retries, parallel agents, and review time. Then compare the final cost per completed engineering outcome.

Use the AI Cost Estimator to model different task sizes and see how model choice changes total spend.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

How to Calculate AI Agent ROI: Cost Per Task vs Developer Hourly Rate Framework

A practical framework for calculating AI coding agent ROI by comparing cost per task against developer hourly rates, with worked examples for teams and adjustments for rework.

NVIDIA ASPIRE Uses Claude Opus 4.6 with 1M Context as Robotics Coding Agent: What It Costs Per Task

NVIDIA and academic partners built ASPIRE, a self-improving robotics framework whose programming brain is Claude Opus 4.6 in 1M-token mode. Success rates jump from 4% to 31% on unseen long-horizon tasks — but every LIBERO-Pro trial burns real tokens. Here is the per-task cost math.

NatureBench Result: Only 17.8% of AI Agent Tasks Beat Published SOTA — What That Means for Research-Agent Cost

NatureBench tested AI coding agents on Nature-paper-grade research tasks. The strongest configuration cleared SOTA on just 17.8% of jobs. We break down what that result means for cost per research-grade task.

← Previous

AI Coding Cost per Pull Request: How to Budget Agent Work in Real Engineering Teams

Gemini 3.5 Flash Enters Coding Agent Workflows: Price, Context, and Cost Tradeoffs