The 30-Minute Minimum Cache Life: GPT-5.6's New Caching Economics Explained
June 27, 2026 · 9 min read
A New Default for OpenAI Caching
OpenAI's GPT-5.6 announcement on June 27, 2026 introduced something its earlier models did not have: a 30-minute minimum cache life, explicit cache breakpoints, and a clearer write/read pricing contract. The pre-5.6 OpenAI caching was implicit prefix matching with a roughly 5-minute window — usable, but not predictable. The 5.6 contract puts OpenAI's caching on par with Anthropic's, with one extra structural advantage on session length.
The new contract spells out three pricing components per OpenAI's announcement:
- Cache write cost: 1.25x the model's uncached input rate. Paid once, when the cache is established.
- Cache read cost: 0.1x the model's uncached input rate (90% discount). Paid each time you re-use the cached content.
- Cache TTL: 30-minute minimum. The cache persists at least 30 minutes, longer if memory allows.
Why 30 Minutes Matters
Real coding-agent sessions run long. OpenAI's own internal report (released the same day) found that 80.6% of Codex users have launched tasks longer than 30 minutes. The previous OpenAI 5-minute cache window was too short for that profile — long agent runs forced periodic cache re-warming, paying the 1.25x write cost multiple times per session.
At 30 minutes, the cache covers a typical "bug fix from filing through merged PR" workflow in a single TTL window. The cost-per-session drops materially because you write the cache once and read from it for the entire session.
Concrete Cost Math for a Mid-Length Session
Take a realistic 25-minute Terra session: 30 turns total, 25K input tokens per turn of which 20K are cacheable (file context + system + tools) and 5K are dynamic (current message). Output is 4K tokens per turn.
Without caching:
- Input: 30 × 25K × $2.50/M = $1.875
- Output: 30 × 4K × $15/M = $1.80
- Total: $3.675 per session
With GPT-5.6 caching contract:
- Cached input written once: 20K × $2.50/M × 1.25 = $0.0625
- Cached input read 29 times: 29 × 20K × $2.50/M × 0.1 = $0.145
- Uncached input (dynamic 5K per turn): 30 × 5K × $2.50/M = $0.375
- Output: $1.80 (unchanged — output is not cached)
- Total: $2.38 per session — about 35% saving.
The savings concentrate entirely on the input side because output is always uncached. For workflows that lean heavily on long context (large codebases, multi-file refactors), the input share of cost is bigger and the caching wins more.
How to Structure Prompts for the GPT-5.6 Contract
Three rules for capturing the full 35-55% savings:
1. Put stable content first. System prompt → tool definitions → retrieved code context → cache breakpoint → dynamic user message. The order matters because the cache key is the prefix up to your breakpoint.
2. Never let dynamic content sneak in before the breakpoint. If your system prompt includes a timestamp or a per-request user ID, the cache key changes every turn and you pay full price. Audit every variable that flows into the prefix.
3. Re-warm the cache before TTL expires on long sessions. The 30-minute minimum means most sessions stay warm. For unusually long sessions (1+ hour), strategically issue a cheap "ping" request to refresh the cache before it expires. Cheaper than letting it expire and paying the write cost again.
How GPT-5.6 Caching Compares to Anthropic and Gemini
- vs Anthropic Claude: Same 90% read discount and 1.25x write cost. Anthropic's default TTL is 5 minutes but extends to 1 hour for an additional cost. For very long sessions (over 30 minutes consistently), Claude's 1-hour option still beats GPT-5.6.
- vs Gemini: Gemini uses automatic implicit caching with only a 25% discount. GPT-5.6 is meaningfully better for any team willing to engineer the prompt structure.
- vs Earlier OpenAI Models (GPT-5.5 and below): Significantly better. Previous OpenAI caching was implicit prefix matching with a ~5-minute window; the explicit-breakpoint contract in 5.6 is a meaningful upgrade.
When the Caching Math Doesn't Help
Three workflows where the 30-min cache life adds little:
- Single-turn workloads. If your typical session is one or two API calls, the 1.25x write cost is paid but you don't read enough times to amortize it. Caching is roughly cost-neutral here.
- Highly dynamic prompts. If your context changes substantially every turn (different files retrieved each time), there's no stable prefix to cache.
- Output-dominated tasks. When you're generating a 50-page document, output cost dominates and caching the small input contributes only marginal savings.
Bottom Line
The 30-minute minimum cache life is the most important under-the-radar change in GPT-5.6. It moves OpenAI from "implicit caching that mostly works" to "explicit caching that predictably saves 30-55% on coding-agent sessions." For teams structuring prompts around this contract, the effective per-million-token cost on Terra and Sol drops meaningfully below the headline list price. Pair the new caching with the multi-tier Sol/Terra/Luna pricing and the GPT-5.6 family becomes meaningfully cheaper to run at scale than its raw price card suggests.
Frequently Asked Questions
What does the 30-minute minimum cache life actually mean?
OpenAI guarantees that once you write content to the cache on a GPT-5.6 request, that content stays available at the 90%-discounted read rate for at least 30 minutes. If memory pressure allows, it may stay longer. The previous OpenAI cache window was effectively about 5 minutes, which forced long agent sessions to re-warm the cache multiple times.
How much can I save with GPT-5.6 caching versus running uncached?
On typical mid-length coding sessions (25K input / 5K output per turn, 30 turns, 80% cacheable input), savings land around 35%. For workflows with larger cacheable context (longer file lists, bigger system prompts), savings reach 50-55%. Pure output-dominated tasks see minimal savings because output is not cached.
Is GPT-5.6's caching better than Anthropic Claude's?
Roughly equivalent for typical sessions. Same 90% read discount, same 1.25x write cost. Claude's default TTL is 5 minutes but extends to 1 hour for an additional cost. GPT-5.6's 30-minute minimum is between the two. For sessions consistently over 30 minutes, Claude's 1-hour option is slightly better; for sessions in the 15-30 minute range, GPT-5.6 wins on the no-extra-config 30-minute default.
Do I need to change my prompt structure for GPT-5.6 caching to work?
Yes, lightly. Put stable content (system, tools, retrieved code) before dynamic content (current user message), and place a cache breakpoint right before the dynamic section. Avoid any variables (timestamps, request IDs) in the cacheable prefix. Existing Anthropic-structured prompts port to GPT-5.6 with little change.
When should I NOT bother with GPT-5.6 caching?
Three cases: single-turn workloads (the 1.25x write cost isn't amortized), highly dynamic prompts where the prefix changes every turn, and output-dominated tasks where input is a small fraction of total cost. For all multi-turn agent workflows reading the same files repeatedly, caching is essentially mandatory for cost-effective operation.
Want to calculate exact costs for your project?
Related Articles
Prompt Caching Across Claude, GPT, and Gemini: A 2026 Cost-Saving Playbook for Coding Agents
Prompt caching is the single biggest cost lever for AI coding agents in 2026 — but every provider implements it differently. We compare Anthropic's explicit breakpoints, OpenAI's new GPT-5.6 30-minute contract, and Gemini's implicit prefix caching. Numbers, decision rules, and the migration trade-offs for switching between them.
Prompt Caching Explained: How to Cut Your AI Coding Costs by Up to 90%
Learn how prompt caching works and why cached input tokens cost 90% less. We break down Anthropic's caching, provider support, and practical tips for maximizing cache hits.
Cognition Hits $26B Valuation and $492M ARR: The Real Economics of AI Coding Agents
Cognition's $1B+ fundraise and $26B valuation reveal the economic model powering the AI coding agent market. We break down what $492M ARR means for pricing, cost-per-task, and what developers actually pay.