← Back to Blog

Why OpenAI Codex Now Drives 99.8% of Internal Token Output: Lessons for Your Own AI Coding Bill

June 27, 2026 · 9 min read

Developer typing on a laptop with code on screen

A Number That Changes the Conversation

On June 27, 2026, OpenAI published an internal-usage report on how Codex changed the company's developer workflow. The headline figure: Codex now produces 99.8% of OpenAI's internal token output, up from less than 10% a year ago. The report also disclosed:

  • 80.6% of users have launched at least one task that ran longer than 30 minutes.
  • 99th-percentile users generate more than 60 hours of agent turns per day.
  • Codex is now embedded in every part of internal development — code review, testing, debugging, documentation, and product feature implementation.

The number that matters for external developers is not "99.8%" — it's what shape of workflow produces 99.8%. When a single tool moves from 10% to 99.8% of an organization's developer output in 12 months, the cost dynamics for everyone else watching are about to follow.

What 60 Hours of Agent Turns Per Day Actually Costs

For a 99th-percentile user running Codex against GPT-5.5 with 25K input / 5K output per agent turn, and roughly one turn per minute of agent work:

  • 60 hours of agent turns × 60 turns/hour = 3,600 turns per day
  • Per-turn cost on GPT-5.5 (uncached): $0.125 + $0.150 = $0.275
  • Daily cost: 3,600 × $0.275 = $990
  • Monthly cost: ~$30,000 per developer

That number drops to roughly $6,000/month with aggressive prompt caching (80% hit rate), or to $3,000/month if the workload moves to Terra at Terra GA. But the order of magnitude is clear: a heavy-usage Codex developer at OpenAI is running an AI bill that exceeds many engineers' salaries. This is the model OpenAI has internally normalized.

What This Says About AI Coding Economics

Three structural implications:

1. Token output, not commits, is becoming the metric. OpenAI's report explicitly tracks tokens generated rather than commits or PRs. If 99.8% of output is Codex, the meaningful productivity metric is no longer "lines of code per engineer" but "tokens per engineer." That changes hiring, performance review, and budget planning.

2. Long-running tasks dominate the bill. 80.6% of users have launched tasks longer than 30 minutes. Long agent runs accumulate context cost exponentially — every additional turn pays for the entire conversation history again. The savings from prompt caching and context engineering are not optional, they're the difference between a $30K/month dev cost and a $6K/month dev cost.

3. The Jevons paradox is real. Cheaper, more capable AI didn't cause OpenAI's engineers to write fewer lines per day. It caused them to write much more output via the AI. The same dynamic will unfold at every team that adopts Codex/Cursor/Claude Code seriously — total spending goes up even as cost-per-token falls.

Lessons for Your Own AI Coding Bill

Most teams will not reach OpenAI's internal usage levels — but the cost-control levers are the same:

Lever 1: Tier your model usage. Don't run every task on the flagship. OpenAI's report doesn't say everyone uses GPT-5.5 for everything — internal teams likely route low-stakes work to cheaper tiers. With GPT-5.6 Terra at half the price of 5.5 and Luna at $1/$6, the multi-tier strategy is the single biggest cost lever in 2026.

Lever 2: Aggressive prompt caching. Long agent runs are where caching pays the most. The 30-minute minimum cache life on GPT-5.6 is designed for exactly the 30+ minute task profile OpenAI's report highlights. Code that re-reads the same files across turns should be saving 70-85% of input cost on cache hits.

Lever 3: Output-tax aware design. Output tokens cost 5-10x more than input tokens at every tier. A prompt that produces a tighter response — even at the cost of slightly more input — almost always wins on total cost. Tell the agent to be concise. Cap output length where appropriate.

Lever 4: Failure-rate accounting. A 99.8% Codex output share at OpenAI almost certainly hides a meaningful retry/rollback rate. Every failed agent run is tokens spent without value. The cost-per-completed-task metric, not cost-per-run, is the right finance dashboard.

The "Internal Adoption Curve" Question

Going from less than 10% to 99.8% in one year is an extreme adoption curve. Most teams will move slower — partly because they don't have OpenAI's internal model access, partly because most production systems can't tolerate Codex-level output volume without serious refactoring of code review and CI workflows. A more realistic target for external teams over 12 months: 30-60% of token output from coding agents, with the rest from traditional developer typing and template-driven generation.

At those levels, AI coding bills are meaningful but not catastrophic. The 99.8% case is interesting as a signal for where things end up at maximum adoption, but it shouldn't be the planning assumption for next quarter's budget.

Bottom Line

OpenAI's 99.8%-Codex number is a marker of where heavy AI coding adoption leads — a single tool dominating organizational output, individual developers running $30K/month bills at the high end, and token output as a real productivity metric. For external teams, the practical takeaway is the cost-control playbook: tier your models, cache aggressively, treat output tokens as the expensive part, and measure by completed tasks not raw runs. Run those four levers and you can comfortably support 10x your current AI coding usage without 10x your bill.

Frequently Asked Questions

Did OpenAI really say 99.8% of internal token output comes from Codex?

Yes, per OpenAI's internal-usage report published on June 27, 2026. The figure is up from under 10% one year earlier. The report attributes the jump to Codex's expansion into every internal development workflow — code review, testing, debugging, documentation, and feature implementation.

How much does a 99th-percentile Codex user actually spend per month?

Roughly $30,000/month at GPT-5.5 prices with no caching, dropping to about $6,000/month with 80% cache hit rates, or roughly $3,000/month once GPT-5.6 Terra is GA (since Terra is half the price of 5.5). Most external developers will not hit 99th-percentile usage levels.

Why is OpenAI's internal Codex usage relevant to my own AI coding bill?

Because the cost-control levers are the same: tier your model usage, cache aggressively on long-running tasks, treat output tokens as the expensive part of every interaction, and measure cost per completed task instead of cost per run. Those four levers determine whether your monthly bill scales linearly with usage or whether it stays roughly flat as you adopt more AI coding agents.

Will my team also hit 99.8% Codex output share?

Almost certainly not — that's the maximum-adoption case for an AI-native company with full internal model access. A realistic 12-month target for most external teams is 30-60% of token output from coding agents, with the rest from traditional development. At those levels the cost dynamics are meaningful but manageable.

What's the single biggest lever for controlling AI coding cost in 2026?

Multi-tier model usage. With GPT-5.6 Sol/Terra/Luna and similar three-tier lineups from Anthropic and Google, the difference between running everything on the flagship vs routing intelligently across tiers is roughly 5-10x in total bill. No other lever — caching, prompt engineering, output-length capping — gets you close to that magnitude of savings on its own.

Want to calculate exact costs for your project?