← Back to Blog

How to Count Tokens Before Sending: Tokenizer Tools, Prompt Sizing, and Cost Control for Coding Agents

By Eric Bush · June 30, 2026 · 8 min read

Digital weighing scale on a workbench with cards labeled tokens, words, and characters

Why Token Counting Matters Before You Hit Send

Most AI coding bill surprises come from one mistake: sending more tokens per call than you thought. A 50K-line repo you dragged into context "to give the model the full picture" is 200K-400K tokens of input. On Claude Opus 4.8 at $15/M input, that single call is $3-6 — and if your agent retries five times you've spent $30 on what felt like a free operation.

The fix is boring and reliable: count before you send. Three categories of tools make this practical.

Tokenizer Library by Provider

Provider Official Tokenizer Language
Anthropic Claude @anthropic-ai/tokenizer JavaScript / Python
OpenAI GPT tiktoken (Python) / js-tiktoken Python / JavaScript / Rust
Google Gemini countTokens API endpoint SDK call (small fee)
DeepSeek DeepSeek tokenizer (HuggingFace) Python via transformers
Generic estimate ~4 chars/token (English) or ~2 chars/token (CJK) Quick sanity check

Quick Wins You Can Apply Today

Five small habits that cut prompt size without quality loss:

1. Strip comments and blank lines from code context. A typical commented codebase has 20-40% non-code lines. Most coding tasks don't need them. The savings on a 100K-token codebase context: 20K-40K tokens per call.

2. Exclude generated files. package-lock.json, yarn.lock, build outputs, vendor directories, type-generated files. Easy to include accidentally; high token cost; near-zero information content for the model.

3. Send file summaries, not full files, on the first pass. A 500-line file's first-pass summary is 100-200 tokens. Send the summary; let the agent ask for the full file if it needs it. Most tasks never need the full file.

4. Use diffs instead of full files for change-review tasks. git diff output for a typical PR is 500-3000 tokens. The before+after of the same change is 10-50K tokens. Same information.

5. Cap conversation history at N turns. Most coding agents pile on conversation history as input on every turn. After 20 turns this dominates input cost. A rolling window of the last 5-8 turns plus a summary of older content keeps cost flat.

Pre-Send Sizing Workflow

A simple, reliable workflow before sending any large prompt:

Step 1 — Assemble the full prompt locally. System prompt + tool descriptions + context files + user message + conversation history.

Step 2 — Run the appropriate tokenizer. Get a real token count, not a character-based estimate.

Step 3 — Multiply by input price. Token count × price/M tokens. For an estimated output of 2-5K tokens, add that to the bill.

Step 4 — Compare against a threshold. Per-call cost threshold of $0.50 / $1 / $5 depending on team comfort. Above the threshold: prompt for confirmation or auto-truncate.

Real-World Savings

A team running a coding agent at scale that adds a pre-send tokenizer + per-call cost preview typically sees:

Metric Before After
Average tokens per agent call ~120K ~45K
Calls above $1 cost (per developer per day) 8-12 1-2
Monthly bill per developer $220 $90
Task completion rate Comparable (-2% to +5%) Comparable (-2% to +5%)

The headline: 60% cost reduction with no measurable quality loss. The savings come from sending the right amount of context, not more.

Specific Tools Worth Knowing

Tiktokenizer.app — web-based, supports OpenAI, Claude, and Llama tokenizers. Paste text, see token count and visual tokenization. Free.

Anthropic count_tokens API endpoint — POST to /v1/messages/count_tokens with the same payload you'd send to messages. Returns token count. Free to call.

OpenAI tiktoken CLIpip install tiktoken, then count tokens in a file or piped stdin. Useful in build scripts and CI.

ai-cost-estimator — for ballpark project-level cost estimates without per-prompt counting; useful for "is this whole project going to cost $5K or $50K?" planning.

Pre-send token counting is one of the least exciting and most reliably effective AI coding cost interventions. Build it into your agent's pipeline once; benefit from it on every call thereafter.

Want to calculate exact costs for your project?

Frequently Asked Questions

What's the fastest way to estimate tokens for a quick sanity check?

Divide character count by 4 for English text, by 2 for CJK languages. For code, divide character count by ~3.5 (denser than English but less dense than Chinese). This gets you within 20% of the real tokenizer count.

Does Anthropic charge for the count_tokens API endpoint?

No. Calling <code>/v1/messages/count_tokens</code> is free and returns the exact token count for a given message payload. Use it before any large call to confirm cost.

Why does the same text produce different token counts on different models?

Each model family uses a different tokenizer with different vocabulary and merge rules. The same 100-word paragraph might be 130 tokens for Claude, 125 for GPT, and 145 for Llama. Always use the tokenizer matching your target model.

What's the single biggest waste of tokens in AI coding agents?

Including generated files (package-lock.json, build outputs, vendor directories) in code context. These are easy to add accidentally, contain near-zero information for the model, and can double or triple input token count on a typical repo.