← Back to Blog

AI-Generated OpenAPI / Swagger Spec Cost: Per Endpoint Token Math for REST APIs

By Eric Bush · July 4, 2026 · 9 min read

A structured JSON payload displayed in a code editor, symbolising API schema generation

The Naive Estimate vs Reality

Ask an LLM to generate an OpenAPI 3 spec for one REST endpoint, and it will happily produce 20-40 lines of YAML for a few hundred tokens. Multiply by 50 endpoints, and the answer is under $3 for a full API on Sonnet 5. That is the naive estimate, and it is missing about 4x of the real cost.

What the naive estimate skips:

  • Request/response body schema traversal (deep object references, cross-endpoint reuse).
  • Example generation for every code sample the spec advertises.
  • Consistency review across endpoints (same field named consistently, same auth scheme, same error format).
  • Second-pass reconciliation with actual runtime behavior.

Per-Endpoint Token Anatomy

A realistic per-endpoint token consumption, for a REST endpoint with 5-8 fields in request and response bodies:

Task Input tokens Output tokens
Reading route handler + type definitions~1,500-
Generating base OpenAPI YAML~500 (system)~600
Generating request/response examples~800~300
Consistency check vs prior endpoints~2,000~150
Per endpoint total~4,800~1,050

Cost by Model for a 50-Endpoint API

Model Total cost (50 endpoints) Notes
Opus 4.8$45-$65Best schema-consistency reasoning
Sonnet 5$8-$12Default; balanced quality
GPT-5.5$10-$15Strong on JSON structural output
Gemini 3 Pro$6-$9Cheapest; weaker on cross-endpoint consistency
Haiku 4.5$3-$5Fine for a first draft; needs Sonnet review pass

Where LLMs Fail on OpenAPI Generation

  • Ref reuse. Two endpoints that return the same shape should share a $ref component. Naïve generation duplicates the schema inline, inflating the spec and drifting between endpoints on later edits.
  • Error responses. Models tend to omit 4xx/5xx documentation unless you prompt for it explicitly, or invent generic error shapes that do not match your actual runtime.
  • Security schemes. If your API uses JWT bearer, cookie, and API key on different endpoints, the model tends to declare one scheme globally instead of per-endpoint.
  • Deprecation and versioning. Rarely handled unless prompted; the model does not know your rollout plan.

The Right Workflow

  1. Extract route handlers, type definitions, and existing tests into a bundle of ~2K-4K tokens.
  2. Pass the bundle to Sonnet 5 with a strict system prompt: "Return valid OpenAPI 3.1 YAML only. Reuse components via $ref when two endpoints share a schema."
  3. Run a linter (spectral lint) on the output. Feed any warnings back to the model for a fix pass.
  4. Cross-check against real HTTP traffic: capture 20-50 real requests and responses, feed them alongside the generated spec, ask the model to flag mismatches.
  5. Commit the spec. Regenerate only on route-handler changes, not on every edit.

Bottom Line

$8-$12 to spec a 50-endpoint API is unusually cheap next to the alternative — half a day of engineer time — but only if you run the consistency and lint passes. Skip those and you get a spec that looks impressive and fails the moment a client codegen runs against it.

Want to calculate exact costs for your project?

Frequently Asked Questions

How much does it cost to auto-generate an OpenAPI spec with an LLM?

For a 50-endpoint REST API with Sonnet 5, expect $8-$12 including per-endpoint generation, example generation, and a consistency-check pass. Opus 4.8 runs $45-$65 for the same workload with better cross-endpoint consistency; Gemini 3 Pro and Haiku 4.5 are cheaper at $3-$9 but require heavier review.

Why is per-endpoint token consumption higher than raw output suggests?

Each endpoint needs ~1,500 tokens of route-handler and type-definition context reading, ~600 tokens of base YAML output, ~1,100 tokens for examples, plus ~2,000 tokens of context for a consistency check against prior endpoints. Total per endpoint is ~5,850 tokens, not the 500-1,000 raw output suggests.

What do LLMs fail at most consistently in OpenAPI generation?

Component reuse via $ref (they inline duplicate schemas instead), error response documentation (often omitted or invented), per-endpoint security schemes (they default to one global scheme), and deprecation/versioning metadata (rarely handled unless prompted).

Should I use a linter on generated OpenAPI specs?

Yes. Run spectral lint after generation and feed warnings back to the model for a fix pass. The generation cost of Sonnet 5 plus one lint-fix round trip is still cheaper than an hour of engineer time, and produces a spec that survives client codegen.

How often should I regenerate the OpenAPI spec?

Regenerate on route-handler or type-definition changes, not on every commit. Cache the last generated spec, diff against current handlers, and only re-run for changed endpoints. This cuts regeneration cost by 80-90% on active codebases.