AI-Assisted i18n Translation Cost: Token Math per 1,000 UI Strings (Claude, GPT, Gemini 2026)
By Eric Bush · July 4, 2026 · 9 min read
Why "It's Just Text" Is Wrong
The naive calculation for translating 1,000 UI strings looks like this: average English string is 8 words ≈ 12 tokens. Add the same for the translated output. Times 1,000. That is 24,000 tokens, or 24 cents on Claude Sonnet. It is a very wrong number.
The real cost multipliers, in order of impact:
- Context per call. You cannot translate an ICU MessageFormat string safely without passing the format specification, the placeholder semantics, and often the containing screen context. That is 300-500 tokens of overhead per call.
- Brand terms and glossary. The product name, feature names, and industry jargon must appear verbatim. A glossary of 50 terms plus rules for when to translate them adds 400-800 tokens per call.
- Verification. Any production i18n pipeline runs at least one verification pass to catch placeholder misalignment, character-limit violations, and RTL issues. That is another full round-trip.
- Ambiguity resolution. "Save" could mean rescue or persist-to-disk. Well-designed UI systems pass screen/tooltip context; that is another 100-200 tokens per string.
Realistic Per-String Token Count
Bundling in the multipliers, a production-quality LLM translation of one string typically consumes:
- Input: ~1,200 tokens (system prompt + glossary + context + string).
- Output: ~30 tokens (the translation).
- Verification pass: ~800 input + ~30 output tokens.
- Per string total: ~2,060 tokens.
Batching (10 strings per call, shared context) reduces the per-string overhead significantly. With batching, per-string effective cost drops to ~350-500 tokens. The math below assumes 400 tokens per string with batching enabled.
1,000 UI Strings, Per Target Language, By Model
| Model | Cost / language | Quality notes |
|---|---|---|
| Claude Opus 4.8 | ~$8-$12 | Best for tone-sensitive UI, marketing copy |
| Claude Sonnet 5 | ~$1.60-$2.40 | Best default; fluent, respects glossary |
| GPT-5.5 | ~$2-$3 | Slightly more literal, cheap output tokens |
| Gemini 3 Pro | ~$1.20-$1.80 | Strong on Asian languages, weaker on nuance |
| Haiku 4.5 | ~$0.60-$0.90 | Cheapest; needs stricter verification pass |
| DeepSeek V3 | ~$0.15-$0.30 | Ultra-cheap; strong Chinese, weaker on RTL |
For a product supporting 10 languages, the total translation bill runs $6-$120 depending on model choice — often cheaper than one hour of a professional translator's time.
Where the Hidden Cost Bites
Three failure modes generate expensive rework:
- Placeholder drift. The model rewrites
{userName}as{utilisateur}in French. UI breaks at runtime. Verification pass or strict schema output catches this. - Length overflow. German translations are on average 30% longer than English source. Buttons and menu items overflow. Pass a max-character constraint per string.
- Plural forms. Slavic languages have 3-4 plural forms; Arabic has 6. Naïve string-per-string translation loses this. Use structured plural output (ICU MessageFormat).
Each failure caught in verification instead of in production saves ~$40-$80 of engineering time. That is why the verification pass — which doubles LLM cost — pays for itself many times over.
Recommended Setup
- Store source strings in an ICU MessageFormat-compatible catalog (JSON, XLIFF, or Fluent).
- Batch 10-20 strings per LLM call, sharing the system prompt and glossary.
- Use Sonnet 5 or Gemini 3 Pro for the translation pass, Haiku for a verification pass.
- Add a schema-enforcing wrapper so the model cannot break placeholder syntax.
- Cap per-string retries at 3; escalate failures to a human review queue.
- Cache translations by source-string hash + glossary version so re-runs are free.
When to Skip LLM Translation Entirely
LLMs are cheaper than professional translators but not always the right tool. Skip LLM translation when:
- Regulatory content requires certified translation (medical, legal, finance).
- Marketing hero copy where brand voice matters more than throughput.
- Languages where the model's training data is thin (some African, Indigenous, low-resource languages).
Everything else — settings screens, error messages, form labels, tooltips — LLM translation at $1-$3 per 1,000 strings is the current price floor and it is unusually good value.
Want to calculate exact costs for your project?
Frequently Asked Questions
How much does it cost to translate 1,000 UI strings with an LLM?
Roughly $0.60-$12 per target language depending on model choice, assuming batched calls with glossary context and a verification pass. Sonnet 5 and Gemini 3 Pro are the best default price/quality picks at $1.60-$2.40 per language for 1,000 strings.
Why is per-string LLM translation more expensive than raw string tokens suggest?
Every call also carries the system prompt, glossary of brand terms, format specification for placeholders, and context about the surrounding screen. That overhead is 300-800 tokens per unbatched call. Batching 10-20 strings per call amortizes it to roughly 400 tokens per string effective.
What are the main failure modes of LLM translation?
Placeholder drift (variables renamed into another language and broken at runtime), length overflow (German runs 30% longer than English and breaks buttons), and plural-form mishandling (Slavic and Arabic languages have 3-6 plural forms). All three are caught by a verification pass and schema-enforced output.
Which LLM should I use for i18n translation?
Sonnet 5 for most cases; Gemini 3 Pro for large Asian-language batches; Opus 4.8 only for tone-sensitive marketing or legal-adjacent copy where nuance matters; Haiku 4.5 or DeepSeek V3 for verification passes or low-stakes languages.
When should I use a human translator instead of an LLM?
For regulatory content requiring certified translation (medical, legal, finance), marketing hero copy where brand voice trumps throughput, and low-resource languages where model training data is thin. Everything else — settings, errors, tooltips, form labels — LLM translation at $1-$3 per 1K strings is now the right default.
Related Articles
AI Documentation Generation Cost: README, JSDoc, Docstrings Compared Across Claude, GPT, and Gemini
Auto-generating docs for a 50K-line repo is one of the highest-ROI AI coding workflows — if you pick the right model. We break down cost per 1000 lines of code documented across providers and doc styles.
Prompt Caching Across Claude, GPT, and Gemini: A 2026 Cost-Saving Playbook for Coding Agents
Prompt caching is the single biggest cost lever for AI coding agents in 2026 — but every provider implements it differently. We compare Anthropic's explicit breakpoints, OpenAI's new GPT-5.6 30-minute contract, and Gemini's implicit prefix caching. Numbers, decision rules, and the migration trade-offs for switching between them.
GPT-5.6 Terra vs Claude Sonnet 4.6 vs Gemini 3.5 Flash: The New Mid-Tier Coding Cost Math
GPT-5.6 Terra arrives at $2.50/$15 per million tokens — slightly cheaper than Claude Sonnet 4.6 on input, same on output, and meaningfully more expensive than Gemini 3.5 Flash. We work through the actual cost-per-task numbers for a 25K-context bug fix, where each model wins, and which one to make the default after June 27, 2026.