Cohere North Mini Code: 80% SWE-Bench at 3B Active Parameters
June 10, 2026 · 7 min read
Near-Frontier Coding From a Tiny Model
Cohere has released North Mini Code, a 30B parameter Mixture-of-Experts (MoE) model that activates only 3B parameters per forward pass. The headline number: 80.2% on SWE-Bench pass@10 — a benchmark that measures real-world software engineering capability, not just isolated function generation. This is released under Apache 2.0, meaning free commercial use with no API dependency.
For context, the current SWE-Bench frontier sits around 85-90% for the largest proprietary models. Achieving 80% with just 3B active parameters means inference costs that are a fraction of what frontier APIs charge — while delivering genuinely useful coding assistance.
Understanding the MoE Advantage for Cost
Mixture-of-Experts models contain many parameters (30B in this case) but only activate a subset for each token. North Mini Code routes each token through the most relevant 3B parameters out of 30B total. The result: model quality of a large model at the inference cost of a small one.
In practical terms, inference compute scales with active parameters, not total parameters. Running North Mini Code costs roughly the same as running any 3B dense model — but produces output quality comparable to models 5-10x larger. This architectural efficiency is what makes the 80% SWE-Bench score so significant for cost optimization.
Memory requirements are higher than a true 3B model (you still need to load 30B parameters into memory), but inference FLOPs — the main driver of per-token cost on cloud hardware — are dramatically lower.
SWE-Bench: Why This Benchmark Matters
SWE-Bench tests models on real GitHub issues from popular Python repositories. The model must understand the issue description, navigate a full codebase, identify the relevant files, and produce a correct patch. This is radically different from HumanEval (isolated function completion) — it measures the kind of work developers actually do.
80.2% pass@10 means that given 10 attempts, the model produces a correct fix 80.2% of the time. For practical coding assistance, this translates to: most real bugs can be fixed by this model if you allow a few retries. The cost of those retries is minimal when inference is nearly free.
Cost Comparison: North Mini Code vs API Models
Let's compare the economics. For API-hosted options, we'll use standard pricing. For North Mini Code, we'll estimate costs based on self-hosting on cloud GPU instances (A10G or L4) and local hardware:
| Model | SWE-Bench | Est. Cost/M Tokens | License | Hosting |
|---|---|---|---|---|
| Claude Opus 4.8 | ~88% | $5/$25 | Proprietary | API only |
| Claude Sonnet 4.6 | ~82% | $3/$15 | Proprietary | API only |
| North Mini Code (cloud) | 80.2% | ~$0.30/$0.60 | Apache 2.0 | Self-host |
| North Mini Code (local) | 80.2% | ~$0.01/$0.02 | Apache 2.0 | Local GPU |
| DeepSeek Coder V3 | ~75% | $0.50/$1.00 | MIT | API or self-host |
| Claude Haiku 4.5 | ~65% | $1/$5 | Proprietary | API only |
The numbers tell a striking story. North Mini Code achieves 97% of Sonnet's SWE-Bench performance at 2-4% of the cost when self-hosted on cloud hardware. On local hardware with a capable GPU (RTX 4090 or similar), the per-token cost approaches zero.
Monthly Cost Scenarios
For a developer processing 200 coding tasks per day (roughly 500K tokens total daily throughput):
| Setup | Monthly Cost | Quality Trade-off |
|---|---|---|
| 100% Sonnet 4.6 | ~$139 | Best overall quality |
| 100% North Mini Code (cloud A10G) | ~$45 (instance cost) | Slightly lower on hardest tasks |
| North Mini Code (local RTX 4090) | ~$8 (electricity) | Same as cloud, slower throughput |
| Hybrid: 70% North Mini + 30% Sonnet | ~$55-73 | Near-Sonnet quality overall |
The Open-Source Coding Model Landscape
North Mini Code enters a crowded field but carves a unique position. Compared to alternatives:
vs DeepSeek Coder V3 (236B): North Mini Code achieves higher SWE-Bench scores with dramatically less compute. DeepSeek V3 requires multiple high-end GPUs for self-hosting; North Mini Code runs on a single GPU.
vs Gemma 4 12B: Gemma fits in less memory (16GB vs ~20GB for North Mini) but scores significantly lower on real-world coding benchmarks. North Mini's MoE architecture provides better quality at a slight memory premium.
vs Qwen 3 Coder 32B: Similar quality tier but North Mini achieves it with 10x fewer active parameters, meaning faster inference and lower per-token compute costs.
Practical Limitations
Before you replace your Sonnet subscription entirely:
Memory requirements: 30B total parameters means ~18GB in FP16 or ~9GB in 4-bit quantization. You need a GPU with at least 12GB VRAM for responsive inference, or 24GB+ for full-speed operation.
The 80% vs 88% gap matters: For the hardest 20% of coding tasks — complex multi-step reasoning, subtle bug patterns, architecture-level decisions — frontier models still outperform. North Mini Code excels at the "middle 60%" of coding work: clear bug fixes, feature additions, test writing, and refactoring.
No multimodal: Unlike Gemma 4, North Mini Code is text/code only. If your workflow involves screenshotting UIs for implementation or processing diagrams, you'll still need a multimodal model.
The Bigger Picture: Inference Cost Collapse
North Mini Code represents a trend that's reshaping AI coding economics: the gap between open-source and proprietary model quality is closing faster than the pricing gap. When a free Apache 2.0 model hits 80% on SWE-Bench and the best proprietary model is at 88%, the question shifts from "can open-source models code?" to "is the remaining 8% worth 25-50x the cost?"
For many teams, the answer is increasingly "no" — at least for the majority of their workload. The economically optimal strategy is moving toward a tiered approach: free/cheap open-source for routine tasks, proprietary frontier models reserved for genuinely hard problems where first-attempt accuracy justifies the premium.
Bottom Line
Cohere North Mini Code at 80.2% SWE-Bench with 3B active parameters is a landmark for AI coding cost optimization. It proves that near-frontier coding quality doesn't require frontier pricing. Self-hosted on a single GPU, it offers 10-25x cost reduction versus API-based alternatives while handling the majority of real-world coding tasks competently. Combined with a proprietary model for the hardest tasks, it enables monthly AI coding budgets under $60 without sacrificing much practical capability.
Want to calculate exact costs for your project?
Related Articles
FrontierCode Benchmark Shows 87% of AI Code Gets Rejected: What This Means for Your Agent Budget
Cognition's FrontierCode benchmark reveals even Claude Opus 4.8 achieves only 13.4% merge rate from real maintainers. We calculate the true cost multiplier when most AI code needs rework.
Anthropic Reports 80% of Merged Code Written by Claude: AI Accelerating Its Own Development
Anthropic reveals over 80% of its merged code is now Claude-generated, with per-engineer output growing 8x since 2021. We analyze the recursive acceleration loop and what it means for future AI pricing.
How to Read SWE-Bench Scores Before Choosing an AI Coding Tool (2026 Guide)
SWE-Bench is the most cited AI coding benchmark, but it's widely misunderstood. This guide explains what the scores actually measure, why benchmark gaming happens, and how to use results to make real cost-benefit decisions.