AI Cost Estimator

Estimate your AI coding costs

← Back to Blog

AI Coding Cost Observability: How to Track Tokens by Agent, Tool, and Workflow

May 24, 2026 · 6 min read

The AI Coding Bill Is Not One Number

AI coding cost observability means tracking where token usage comes from, not just how many tokens were used. A monthly bill might say the team spent $800 on model usage, but that number is not actionable. Was the spend caused by code generation, code review, tests, long context, screenshots, MCP tools, or subagents?

Without attribution, teams either ignore the bill or overreact by restricting all AI usage. The better approach is to measure cost by workflow and outcome. Some expensive workflows are worth it. Some cheap workflows are pure waste because they produce no accepted code.

The Minimum Metrics to Capture

A useful cost dashboard does not need to be complicated. Start with a small set of fields that connect model usage to engineering work.

Metric Why it matters
Model nameDifferent models have very different input and output prices.
Input and output tokensCoding agents are usually input-heavy, but generated code can be output-heavy.
Agent or session IDLinks spend to the process that caused it.
Tool or MCP serverShows whether external context sources are inflating prompts.
Repository and PRConnects cost to shipped engineering work.
OutcomeSeparates accepted work from discarded exploration.

Track Workflows, Not Just Prompts

Prompt-level tracking is useful, but workflow-level tracking is where cost decisions become clear. A single pull request might include planning, editing, test generation, failure analysis, code review, documentation, and a final summary. Each step has a different expected return.

  • Planning: high reasoning value, usually modest output cost.
  • Repository exploration: high input cost, often repetitive.
  • Code generation: output-heavy and easy to measure by accepted diff.
  • Testing and debugging: can loop, so it needs explicit caps.
  • Review: often worth premium models because mistakes are expensive.

The Best Unit: Cost per Accepted Change

The most useful metric is not cost per token. It is cost per accepted engineering outcome. For coding agents, that might be cost per merged pull request, cost per bug fixed, cost per test added, or cost per feature shipped. A $12 agent run that merges a production fix is cheap. A $1 agent run that produces discarded code is expensive.

To calculate this, tag each AI-assisted session with an outcome. Did the code ship? Was it edited heavily by a human? Did it fail tests? Did it get abandoned? Over time, this shows which workflows deserve premium models and which should use cheaper models or stricter prompts.

Common Cost Observability Mistakes

  • Tracking only total monthly spend.
  • Ignoring input tokens, even though coding agents read large contexts.
  • Not separating user-visible chat from background agents.
  • Counting generated lines of code without measuring accepted lines of code.
  • Failing to attribute MCP and tool output to the workflow that requested it.

A Simple Starting Dashboard

Start with five charts: spend by model, spend by repository, spend by workflow type, spend by tool or MCP server, and cost per merged pull request. That is enough to identify most waste. If one repository has unusually high cost per PR, inspect context size. If one tool dominates spend, tighten the query. If one model dominates cost without better outcomes, route routine tasks to a cheaper model.

Once you know the workflow shape, use the AI Cost Estimator to compare what the same workload would cost on different models.

Want to calculate exact costs for your project?