← Back to Blog

Meituan LongCat 2.0 at $0.75/$2.95: 1.6T MoE Reshapes Agent Coding Economics

By Eric Bush · July 1, 2026 · 8 min read

Blue neon geometric grid pattern receding into darkness

The Headline Numbers

Meituan's LongCat team dropped LongCat 2.0 on June 30, 2026 — a 1.6-trillion-parameter mixture-of-experts model with roughly 48B active parameters per token. It ships with native 1M context and pricing that lands cleanly below the cheap-tier competition:

Model Input / M Output / M Context
LongCat 2.0 $0.75 $2.95 1M
DeepSeek V4-Flash $0.14 $0.42 128K
Qwen 3.6 27B $0.20 $0.80 256K
Claude Sonnet 5 (promo) $2.00 $10.00 200K

LongCat 2.0 is not the cheapest — DeepSeek V4-Flash still wins on raw price. What it offers is a middle tier that DeepSeek can't: 1M native context, agent-tuned training, and the LSA (linear sparse attention) mechanism that keeps 1M context from turning into a wall of latency.

Where 1M Context Actually Pays Off

Most coding tasks fit comfortably in 128K. The 1M window is not for a bigger prompt — it's for a longer agent trajectory. A Claude Code session that lasts 90 minutes will pass through 200–400K tokens of accumulated conversation and tool output. On models with 128K windows, that means periodic context compaction or agent restarts. On a 1M-native model, the agent runs to natural completion.

The cost implication: avoiding a mid-session restart saves ~20–30% of a session's tokens, because the restart re-reads system prompts, re-indexes files, and reproduces state the previous session held. LongCat 2.0's price-per-token is higher than DeepSeek's, but if it eliminates one restart per session, net cost drops.

Per-Task Cost: Real Numbers

Take a common agent task: implement a feature that touches 6 files, requires two rounds of test runs, and consumes 800K input tokens and 60K output tokens across the trajectory. The math:

Model Input cost Output cost Total
LongCat 2.0 $0.60 $0.18 $0.78
DeepSeek V4-Flash $0.11 $0.03 $0.14 (+ restart overhead)
Qwen 3.6 27B $0.16 $0.05 $0.21
Claude Sonnet 5 (promo) $1.60 $0.60 $2.20

LongCat 2.0 sits at roughly 1/3 the cost of Sonnet 5 promo and 5.5× DeepSeek. The bet: LongCat's agent-specific training closes the quality gap with Sonnet 5 on execution tasks, while DeepSeek needs more retries at long horizons.

LSA and Why the KV Cache Matters

The technical detail worth caring about: LongCat 2.0 uses Linear Sparse Attention (LSA), which reduces the quadratic cost of long context to something closer to linear on the attention path. On providers that charge for KV cache storage separately (or where cache eviction hurts you), this changes the math. Meituan claims 1M context runs at sub-linear cost overhead compared to 128K on standard attention.

This is not just a benchmark curiosity — long context that is cheap to hold unlocks agent patterns that require persistent state. A code review agent that reviews an entire monorepo, a documentation agent that regenerates docs against the full codebase, a migration agent that tracks a two-week task — all of these get economically viable on models that hold 1M context cheaply.

The Honest Positioning

LongCat 2.0 is not competing head-on with Claude Sonnet 5. It's aiming at a different price/context tradeoff: cheap enough for daily coding, long enough for whole-project trajectories. If you have workloads bottlenecked by context length rather than reasoning depth, this is worth a benchmark run against your own agent traces.

If your coding tasks fit inside 128K and don't need multi-hour agent runs, DeepSeek V4-Flash remains the cheaper answer. If they routinely bust 200K and you're paying Claude for the privilege, LongCat 2.0 is the model to evaluate this quarter.

Want to calculate exact costs for your project?

Frequently Asked Questions

How does LongCat 2.0 compare to DeepSeek V4-Flash for daily coding?

DeepSeek V4-Flash is roughly 5× cheaper per token but caps at 128K context. LongCat 2.0 costs more per token but holds 1M native context, so it wins when a session or trajectory needs to exceed DeepSeek's window without restart overhead.

What is Linear Sparse Attention (LSA)?

A sparse attention mechanism used in LongCat 2.0 that reduces the cost of long-context inference from quadratic toward linear. Practically, it lets 1M context sessions run without the latency and KV cache cost penalties typical of dense attention at that scale.

Is LongCat 2.0 worth the cost premium over DeepSeek for agent coding?

Only if you regularly exceed 128K context per session or need long agentic trajectories. For scoped coding tasks under 128K, DeepSeek V4-Flash remains cheaper and adequate.

What are the 48B active parameters in a 1.6T MoE model?

MoE (mixture of experts) models activate only a subset of parameters per token. LongCat 2.0's 1.6T total parameters route through gating layers that select roughly 48B active parameters per forward pass — giving frontier-scale capacity at mid-scale inference cost.

Does LongCat 2.0 support prompt caching?

Yes — the input cache price is $0.015/M tokens, well below the standard input rate. For agents with stable system prompts and repeated codebase reads, cache reuse cuts effective input cost significantly.