How to Count Tokens Before Sending: Tokenizer Tools, Prompt Sizing, and Cost Control for Coding Agents

By Eric Bush · June 30, 2026 · 8 min read

Digital weighing scale on a workbench with cards labeled tokens, words, and characters

Why Token Counting Matters Before You Hit Send

Most AI coding bill surprises come from one mistake: sending more tokens per call than you thought. A 50K-line repo you dragged into context "to give the model the full picture" is 200K-400K tokens of input. On Claude Opus 4.8 at $15/M input, that single call is $3-6 — and if your agent retries five times you've spent $30 on what felt like a free operation.

The fix is boring and reliable: count before you send. Three categories of tools make this practical.

Tokenizer Library by Provider

Provider	Official Tokenizer	Language
Anthropic Claude	`@anthropic-ai/tokenizer`	JavaScript / Python
OpenAI GPT	`tiktoken` (Python) / `js-tiktoken`	Python / JavaScript / Rust
Google Gemini	`countTokens` API endpoint	SDK call (small fee)
DeepSeek	DeepSeek tokenizer (HuggingFace)	Python via `transformers`
Generic estimate	~4 chars/token (English) or ~2 chars/token (CJK)	Quick sanity check

Quick Wins You Can Apply Today

Five small habits that cut prompt size without quality loss:

1. Strip comments and blank lines from code context. A typical commented codebase has 20-40% non-code lines. Most coding tasks don't need them. The savings on a 100K-token codebase context: 20K-40K tokens per call.

2. Exclude generated files. package-lock.json, yarn.lock, build outputs, vendor directories, type-generated files. Easy to include accidentally; high token cost; near-zero information content for the model.

3. Send file summaries, not full files, on the first pass. A 500-line file's first-pass summary is 100-200 tokens. Send the summary; let the agent ask for the full file if it needs it. Most tasks never need the full file.

4. Use diffs instead of full files for change-review tasks. git diff output for a typical PR is 500-3000 tokens. The before+after of the same change is 10-50K tokens. Same information.

5. Cap conversation history at N turns. Most coding agents pile on conversation history as input on every turn. After 20 turns this dominates input cost. A rolling window of the last 5-8 turns plus a summary of older content keeps cost flat.

Pre-Send Sizing Workflow

A simple, reliable workflow before sending any large prompt:

Step 1 — Assemble the full prompt locally. System prompt + tool descriptions + context files + user message + conversation history.

Step 2 — Run the appropriate tokenizer. Get a real token count, not a character-based estimate.

Step 3 — Multiply by input price. Token count × price/M tokens. For an estimated output of 2-5K tokens, add that to the bill.

Step 4 — Compare against a threshold. Per-call cost threshold of $0.50 / $1 / $5 depending on team comfort. Above the threshold: prompt for confirmation or auto-truncate.

Real-World Savings

A team running a coding agent at scale that adds a pre-send tokenizer + per-call cost preview typically sees:

Metric	Before	After
Average tokens per agent call	~120K	~45K
Calls above $1 cost (per developer per day)	8-12	1-2
Monthly bill per developer	$220	$90
Task completion rate	Comparable (-2% to +5%)	Comparable (-2% to +5%)

The headline: 60% cost reduction with no measurable quality loss. The savings come from sending the right amount of context, not more.

Specific Tools Worth Knowing

Tiktokenizer.app — web-based, supports OpenAI, Claude, and Llama tokenizers. Paste text, see token count and visual tokenization. Free.

Anthropic count_tokens API endpoint — POST to /v1/messages/count_tokens with the same payload you'd send to messages. Returns token count. Free to call.

OpenAI tiktoken CLI — pip install tiktoken, then count tokens in a file or piped stdin. Useful in build scripts and CI.

ai-cost-estimator — for ballpark project-level cost estimates without per-prompt counting; useful for "is this whole project going to cost $5K or $50K?" planning.

Pre-send token counting is one of the least exciting and most reliably effective AI coding cost interventions. Build it into your agent's pipeline once; benefit from it on every call thereafter.

Want to calculate exact costs for your project?

Estimate Your AI Coding Costs →Compare Token Pricing →

Frequently Asked Questions

What's the fastest way to estimate tokens for a quick sanity check?

Divide character count by 4 for English text, by 2 for CJK languages. For code, divide character count by ~3.5 (denser than English but less dense than Chinese). This gets you within 20% of the real tokenizer count.

Does Anthropic charge for the count_tokens API endpoint?

No. Calling <code>/v1/messages/count_tokens</code> is free and returns the exact token count for a given message payload. Use it before any large call to confirm cost.

Why does the same text produce different token counts on different models?

Each model family uses a different tokenizer with different vocabulary and merge rules. The same 100-word paragraph might be 130 tokens for Claude, 125 for GPT, and 145 for Llama. Always use the tokenizer matching your target model.

What's the single biggest waste of tokens in AI coding agents?

Including generated files (package-lock.json, build outputs, vendor directories) in code context. These are easy to add accidentally, contain near-zero information for the model, and can double or triple input token count on a typical repo.

Two Vibe Coding Prompts That Cut Hidden AI Coding Costs: First Principles and Adversarial Review

A June 2026 AIHOT case study highlighted two prompts behind a 10M-request/week vibe-coded project: first-principles reasoning and adversarial review. We turn them into a practical cost-control workflow for AI coding agents.

What Is a Token? How AI Coding Tools Count and Bill Tokens (2026 Guide)

A plain-English guide to what a token is, how AI coding tools count tokens for your code and prompts, and how that translates into your bill — with concrete examples across Claude, GPT, and DeepSeek pricing.

How to Count Tokens Before You Code: Estimating AI Coding Costs Accurately

You can't budget what you can't measure. Learn how tokens map to code, how to estimate token counts before a project starts, and how to turn that estimate into a real dollar figure.

← Previous

Eval-Driven Prompt Debugging: How Anthropic Engineers Cut Production Costs With XML Tags and Tool-Use Math