Mistral OCR 4 Ships with Bounding Boxes and Self-Hosting: Document-RAG Cost Math vs OpenAI Vision

June 24, 2026 · 7 min read

Stack of paper documents with handwritten annotations next to a laptop

What Changed With OCR 4

Mistral released Mistral OCR 4 on June 23, 2026, with three feature additions that change the OCR build-vs-buy math for AI coding agents that process documents:

Bounding boxes for every extracted text span (essential for layout-aware RAG)
Per-span confidence scores (lets you route low-confidence pages to a stronger model)
Self-hostable weights with permissive licensing (the headline change)

The launch puts Mistral squarely in competition with OpenAI's Vision API, Google's Gemini 2.5 Flash for OCR tasks, AWS Textract, and Anthropic Claude's vision capabilities. For developers building document-grounded AI coding agents (contracts, invoices, code documentation PDFs), this is the first time self-hosting has been competitive on quality with the hosted options.

Per-1000-Page Cost Comparison

Standard benchmark: a 1,000-page PDF batch, mix of typeset and scanned content, ~800 tokens of extracted text per page on average. Pricing as of June 2026:

OpenAI Vision (GPT-5.5 with image input): ~$25-30 per 1000 pages. Excellent quality, no bounding boxes natively (have to derive them), no batch discount.

Gemini 2.5 Flash Image: ~$8-12 per 1000 pages. Good quality on typeset text, drops accuracy on handwriting. Bounding boxes via separate API call.

AWS Textract: $1.50 per 1000 pages for plain text + $15 for tables/forms. Solid on financial documents, worse on engineering PDFs with mixed content.

Mistral OCR 4 hosted API: ~$3-5 per 1000 pages (Mistral's published pricing). Includes bounding boxes and confidence scores natively.

Mistral OCR 4 self-hosted on H100: ~$0.40-0.80 per 1000 pages, dominated by GPU rental. Throughput on a single H100 lands around 50-80 pages/second.

When Self-Hosting Pays Off

Self-hosting Mistral OCR 4 on a rented H100 ($2-3/hour spot, $4-5/hour on-demand) breaks even against the hosted API at roughly 10K-20K pages per day. Above that volume, self-hosting is meaningfully cheaper. Below it, the hosted API wins after accounting for ops overhead.

Real-world thresholds:

Sub-100K pages/month: stick with hosted API. Ops overhead dominates.
100K-500K pages/month: hosted API is fine, but self-hosting becomes attractive if you have an ops team already running GPU infra.
500K+ pages/month: self-host. The savings compound fast.

Quality Comparison on Coding-Adjacent Documents

For AI coding agents specifically, OCR quality matters most on:

Code-in-PDF extraction. Mistral OCR 4 preserves indentation and code structure better than Vision API alternatives, because its training corpus weighted technical documents heavily. Critical for ingesting API documentation PDFs.

Tables and layouts. AWS Textract still leads on financial-document tables. Mistral OCR 4 is competitive on engineering tables but lags Textract on pure financial layouts.

Handwriting. All current OCR tools are still mediocre on handwriting. GPT-5.5 Vision is marginally the best, but the gap closed in 2026.

A Hybrid Pipeline That Saves 70%

The most cost-effective production pattern as of mid-2026: run Mistral OCR 4 self-hosted as the default extractor; use the confidence scores to flag low-confidence pages; route those pages only to GPT-5.5 Vision or Claude Vision for re-extraction. Typical re-route rate is 5-10%, so the bulk of cost stays on the cheap path.

For a 1M-pages/month workload, the math:

Pure GPT-5.5 Vision: $25,000-$30,000/month
Pure Mistral OCR 4 self-hosted: $400-$800/month
Hybrid (5% re-route to GPT-5.5): $1,650-$2,300/month total

The hybrid produces near-GPT-5.5 quality at 90%+ cost savings. The bounding boxes and confidence scores added in OCR 4 are what make this pipeline practical — without confidence scores, you cannot reliably gate when to escalate.

Implementation Notes

The Mistral OCR 4 weights are around 8B parameters and fit comfortably on a single 80GB GPU. INT8 quantization runs on 24GB consumer GPUs with about 15% quality loss — usable for batch jobs where the small drop is acceptable but rarely worth it for production RAG.

For teams already running Llama or DeepSeek inference, Mistral OCR 4 fits the same vLLM/TGI infrastructure. Operationally it is a single image in your existing GPU pool, not a new platform.

Frequently Asked Questions

How much does Mistral OCR 4 cost per 1000 pages compared to OpenAI Vision?

Mistral OCR 4 hosted API is ~$3-5 per 1000 pages vs ~$25-30 for OpenAI Vision. Self-hosted on an H100, Mistral OCR 4 drops to ~$0.40-0.80 per 1000 pages — roughly 30-60x cheaper than Vision API.

When should I self-host Mistral OCR 4 instead of using the hosted API?

Breakeven is around 10K-20K pages per day. Below 100K pages/month, use the hosted API. 100K-500K/month, self-host if you have GPU infra ops already. Above 500K/month, self-host always — the savings compound fast.

Is Mistral OCR 4 as accurate as GPT-5.5 Vision for code-in-PDF extraction?

For typical technical documentation, yes — Mistral OCR 4 actually preserves code indentation and structure better because of its training corpus weighting. GPT-5.5 Vision still leads on handwriting and edge-case layouts.

What's the cheapest production OCR pipeline for AI coding agents?

A hybrid: Mistral OCR 4 self-hosted as the default extractor + GPT-5.5 Vision for the 5-10% of pages flagged as low-confidence by Mistral's confidence scores. For 1M pages/month, this runs $1,650-$2,300 vs $25,000+ for pure Vision API — about 90% savings at near-equivalent quality.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

What Is a Self-Hosted OCR Pipeline? Cost Math for AI Coding Agents That Process PDFs

If your AI coding agent ingests PDFs (API docs, contracts, internal manuals), self-hosting OCR can cut document costs 90%+. We explain what a self-hosted OCR pipeline looks like, when it pays off, and how to build one.

OpenRouter Adds Data Residency Routing: Compliance Cost vs Self-Hosting a Gateway

OpenRouter's June 2026 routing controls let teams enforce data residency through provider order, allow_fallbacks, and ZDR flags. We compare the compliance cost against self-hosting LiteLLM.

Self-Hosted vs API AI Coding: Total Cost of Ownership in 2026

A comprehensive TCO analysis comparing self-hosted open-source models against cloud API services for AI coding in 2026. Covers hardware costs, operational overhead, and the crossover points where each approach wins.

← Previous

Apple Research: 9 LLM Judges = 2 Independent Votes. Stop Paying for Redundant Judge Panels

IBM Open-Sources CUGA: Lightweight Agent Framework Cuts 80% of Custom Engineering Cost