AI Cost Estimator

Estimate your AI coding costs

← Back to Blog

Cohere North Mini Code: 80% SWE-Bench at 3B Active Parameters

June 10, 2026 · 7 min read

Server room with blue lighting and network cables

Near-Frontier Coding From a Tiny Model

Cohere has released North Mini Code, a 30B parameter Mixture-of-Experts (MoE) model that activates only 3B parameters per forward pass. The headline number: 80.2% on SWE-Bench pass@10 — a benchmark that measures real-world software engineering capability, not just isolated function generation. This is released under Apache 2.0, meaning free commercial use with no API dependency.

For context, the current SWE-Bench frontier sits around 85-90% for the largest proprietary models. Achieving 80% with just 3B active parameters means inference costs that are a fraction of what frontier APIs charge — while delivering genuinely useful coding assistance.

Understanding the MoE Advantage for Cost

Mixture-of-Experts models contain many parameters (30B in this case) but only activate a subset for each token. North Mini Code routes each token through the most relevant 3B parameters out of 30B total. The result: model quality of a large model at the inference cost of a small one.

In practical terms, inference compute scales with active parameters, not total parameters. Running North Mini Code costs roughly the same as running any 3B dense model — but produces output quality comparable to models 5-10x larger. This architectural efficiency is what makes the 80% SWE-Bench score so significant for cost optimization.

Memory requirements are higher than a true 3B model (you still need to load 30B parameters into memory), but inference FLOPs — the main driver of per-token cost on cloud hardware — are dramatically lower.

SWE-Bench: Why This Benchmark Matters

SWE-Bench tests models on real GitHub issues from popular Python repositories. The model must understand the issue description, navigate a full codebase, identify the relevant files, and produce a correct patch. This is radically different from HumanEval (isolated function completion) — it measures the kind of work developers actually do.

80.2% pass@10 means that given 10 attempts, the model produces a correct fix 80.2% of the time. For practical coding assistance, this translates to: most real bugs can be fixed by this model if you allow a few retries. The cost of those retries is minimal when inference is nearly free.

Cost Comparison: North Mini Code vs API Models

Let's compare the economics. For API-hosted options, we'll use standard pricing. For North Mini Code, we'll estimate costs based on self-hosting on cloud GPU instances (A10G or L4) and local hardware:

Model SWE-Bench Est. Cost/M Tokens License Hosting
Claude Opus 4.8 ~88% $5/$25 Proprietary API only
Claude Sonnet 4.6 ~82% $3/$15 Proprietary API only
North Mini Code (cloud) 80.2% ~$0.30/$0.60 Apache 2.0 Self-host
North Mini Code (local) 80.2% ~$0.01/$0.02 Apache 2.0 Local GPU
DeepSeek Coder V3 ~75% $0.50/$1.00 MIT API or self-host
Claude Haiku 4.5 ~65% $1/$5 Proprietary API only

The numbers tell a striking story. North Mini Code achieves 97% of Sonnet's SWE-Bench performance at 2-4% of the cost when self-hosted on cloud hardware. On local hardware with a capable GPU (RTX 4090 or similar), the per-token cost approaches zero.

Monthly Cost Scenarios

For a developer processing 200 coding tasks per day (roughly 500K tokens total daily throughput):

Setup Monthly Cost Quality Trade-off
100% Sonnet 4.6 ~$139 Best overall quality
100% North Mini Code (cloud A10G) ~$45 (instance cost) Slightly lower on hardest tasks
North Mini Code (local RTX 4090) ~$8 (electricity) Same as cloud, slower throughput
Hybrid: 70% North Mini + 30% Sonnet ~$55-73 Near-Sonnet quality overall

The Open-Source Coding Model Landscape

North Mini Code enters a crowded field but carves a unique position. Compared to alternatives:

vs DeepSeek Coder V3 (236B): North Mini Code achieves higher SWE-Bench scores with dramatically less compute. DeepSeek V3 requires multiple high-end GPUs for self-hosting; North Mini Code runs on a single GPU.

vs Gemma 4 12B: Gemma fits in less memory (16GB vs ~20GB for North Mini) but scores significantly lower on real-world coding benchmarks. North Mini's MoE architecture provides better quality at a slight memory premium.

vs Qwen 3 Coder 32B: Similar quality tier but North Mini achieves it with 10x fewer active parameters, meaning faster inference and lower per-token compute costs.

Practical Limitations

Before you replace your Sonnet subscription entirely:

Memory requirements: 30B total parameters means ~18GB in FP16 or ~9GB in 4-bit quantization. You need a GPU with at least 12GB VRAM for responsive inference, or 24GB+ for full-speed operation.

The 80% vs 88% gap matters: For the hardest 20% of coding tasks — complex multi-step reasoning, subtle bug patterns, architecture-level decisions — frontier models still outperform. North Mini Code excels at the "middle 60%" of coding work: clear bug fixes, feature additions, test writing, and refactoring.

No multimodal: Unlike Gemma 4, North Mini Code is text/code only. If your workflow involves screenshotting UIs for implementation or processing diagrams, you'll still need a multimodal model.

The Bigger Picture: Inference Cost Collapse

North Mini Code represents a trend that's reshaping AI coding economics: the gap between open-source and proprietary model quality is closing faster than the pricing gap. When a free Apache 2.0 model hits 80% on SWE-Bench and the best proprietary model is at 88%, the question shifts from "can open-source models code?" to "is the remaining 8% worth 25-50x the cost?"

For many teams, the answer is increasingly "no" — at least for the majority of their workload. The economically optimal strategy is moving toward a tiered approach: free/cheap open-source for routine tasks, proprietary frontier models reserved for genuinely hard problems where first-attempt accuracy justifies the premium.

Bottom Line

Cohere North Mini Code at 80.2% SWE-Bench with 3B active parameters is a landmark for AI coding cost optimization. It proves that near-frontier coding quality doesn't require frontier pricing. Self-hosted on a single GPU, it offers 10-25x cost reduction versus API-based alternatives while handling the majority of real-world coding tasks competently. Combined with a proprietary model for the hardest tasks, it enables monthly AI coding budgets under $60 without sacrificing much practical capability.

Want to calculate exact costs for your project?