AI Cost Estimator

Estimate your AI coding costs

← Back to Blog

Implicit vs. Explicit Prompt Caching in 2026: Claude, Qwen3-Max, and DeepSeek Compared

May 26, 2026 · 7 min read

Prompt Caching Just Got More Universal

In May 2026, Alibaba's Qwen team announced that Qwen3-Max now supports implicit prompt caching — automatically enabled, no configuration required. For developers already using Qwen3-Max for coding tasks, the cost savings activate immediately without a single code change.

This makes Qwen3-Max the latest in a growing list of providers supporting automatic caching. But "prompt caching" is not a single feature — the implementation details vary significantly across providers, and those details determine how much you actually save. Here is a complete comparison.

Implicit vs. Explicit Caching: The Key Distinction

The core difference between implicit and explicit caching is control:

  • Implicit (automatic) caching: The provider caches frequently-repeated input prefixes automatically. You do not mark anything. The system decides what to cache based on usage patterns. Zero engineering effort, but you have no visibility into what is cached or when.
  • Explicit caching: You mark specific sections of the input with cache control markers. The provider caches exactly what you specify. Requires code changes, but gives you precise control over cache behavior, hit rates, and cost optimization.

Neither approach is strictly superior. Implicit caching is better when you want savings without engineering investment. Explicit caching is better when you need guaranteed high cache hit rates and maximum savings on large, stable contexts.

Provider-by-Provider Comparison

Provider Cache Type Cache Read Discount Cache TTL Min. Cacheable Tokens
Anthropic (Claude) Explicit 90% off input 5 minutes (refreshed on use) 1,024 tokens
OpenAI (GPT-5.x) Implicit 50% off input ~5–10 minutes 1,024 tokens
DeepSeek Implicit ~86% off input (V4 Flash) Several hours 64 tokens
Qwen3-Max Implicit (+ Explicit available) ~80% off input Session-based ~500 tokens
Google (Gemini) Explicit (Context Caching) ~75% off input 60 minutes (configurable) 32,768 tokens

Note: DeepSeek's cache read rate for V4-Flash is approximately $0.014/M versus the standard $0.112/M input — an 87.5% discount, making it the most aggressive cache pricing currently available among major providers.

When Explicit Caching Wins

Claude's explicit caching approach requires you to mark content with cache_control: {"type": "ephemeral"} in the API request, but delivers the highest discount (90% off) and gives you full control. This is the right choice when:

  • You have a large, stable system prompt (5,000+ tokens) that does not change between requests
  • You are feeding the same document or codebase context repeatedly across many API calls
  • Your application has high token volume and you need predictable, maximized savings
  • You want to track cache hit rates and know exactly what is being cached

The 90% discount on Claude versus 50% on OpenAI's implicit caching means that on large stable contexts, Claude's effective input cost after caching can actually be competitive with models that have lower headline input rates.

When Implicit Caching Wins

Implicit caching (OpenAI, Qwen3-Max, DeepSeek) is the right choice when:

  • You are prototyping or in early development and do not want to add caching infrastructure yet
  • Your prompts are somewhat variable and predicting cacheable prefixes is complex
  • You are using a third-party tool or library that does not expose cache control settings
  • The discount offered (50–87%) is sufficient for your budget without needing the full 90%

The Real-World Impact on a Coding Agent

For a coding agent making 1,000 calls per day with a 3,000-token system prompt and 20,000-token codebase context on each call, the monthly savings from effective caching are substantial:

  • Without caching (Claude Sonnet 4.6): 23,000 tokens × 1,000 calls × 30 days = 690M tokens × $3.00/M = $2,070/month
  • With 90% cache hit rate (Claude explicit): ~$207/month + one-time cache write costs ≈ $250/month total
  • Monthly savings: ~$1,820

Caching is not a minor optimization — it is often the largest single cost lever available. Use the AI Cost Estimator to calculate your specific savings based on your call volume, context size, and provider.

Want to calculate exact costs for your project?