AI Observability Stack Cost: OpenTelemetry + Grafana + Custom Traces for Coding Agent Fleets
By Eric Bush · July 3, 2026 · 9 min read
Why AI Observability Is Different
Traditional application observability tracks requests, latency, and error rates. For AI coding agents, you need to track those plus:
- Token consumption per request, session, user, and workflow.
- Cache hit rates on prompt caching.
- Model routing decisions (which model handled which request and why).
- Tool call sequences within an agent trajectory.
- Retry patterns and their cost.
- Cost attribution to business unit or feature.
Without these, you cannot answer basic operational questions like "why did our AI bill spike last Tuesday" or "which repos consume the most tokens." This post prices out the observability stack that gets you those answers.
The Three Stack Layers
Every AI observability stack has three layers, whether you buy them separately or as an integrated product:
- Instrumentation. The code in your agent that emits spans, metrics, and events. OpenTelemetry is the standard.
- Transport and storage. Where the telemetry data goes: Grafana Cloud, Datadog, Honeycomb, New Relic, or self-hosted.
- Query and visualization. Dashboards, alerts, and ad-hoc queries. Grafana, Datadog, or custom.
Instrumentation Cost
Instrumentation itself is nearly free — OpenTelemetry SDKs are open source. The real cost is engineer time:
- Basic instrumentation: Wrap Anthropic/OpenAI SDK calls to emit spans with token counts, model name, and latency. About 2–4 engineer-hours.
- Agent trajectory tracing: Emit spans for each tool call and thought step, with parent-child relationships. 8–16 engineer-hours.
- Cost attribution: Add user, session, workflow, and cost-center attributes to every span. 4–8 engineer-hours.
- Ongoing maintenance: Updates as new agent features ship. 2–4 hours/month.
One-time cost: about $3,000–$6,000 in engineer time. Ongoing: about $400/month.
Transport and Storage Cost by Provider
Cost varies dramatically by ingest volume. Assume a 20-developer team generating 100 traces per developer per day, each with 30 spans averaging 400 bytes: about 24M spans/month or roughly 10 GB/month of trace data plus metrics and logs.
| Vendor | Monthly cost (10GB) | Notes |
|---|---|---|
| Grafana Cloud Pro | $100–$200 | Includes metrics, logs, traces |
| Datadog APM + Logs | $450–$800 | Best UX but most expensive |
| Honeycomb | $200–$400 | Strong trace query capabilities |
| New Relic | $300–$600 | Consumption-based pricing |
| Self-hosted Tempo + Prometheus | $50–$120 | Requires EC2 or GKE, 2 engineer-days setup |
The Custom Traces You Actually Need
Off-the-shelf observability tools capture what any HTTP client would produce. For AI coding agents you need custom span attributes:
- llm.model — which model handled the request.
- llm.tokens.input, llm.tokens.output, llm.tokens.cached_read — token accounting.
- llm.cost_usd — computed cost of this specific call.
- agent.trajectory.id — group all spans for one agent task.
- agent.tool.name, agent.tool.arguments_size — which tools ran and with what.
- agent.retry.count — how many times we retried this task.
- business.workflow, business.cost_center — for chargeback.
Standard OTel semantic conventions cover some of this in the gen_ai.* namespace. Custom attributes fill the rest.
Full Stack Cost for a 20-Developer Team
| Component | Vendor (Grafana Cloud) | Self-hosted |
|---|---|---|
| Storage & ingest | $150/mo | $85/mo (EC2 + S3) |
| Instrumentation dev time | $400/mo amortized | $400/mo amortized |
| Dashboards & alerts | Included | $15/mo (Grafana OSS + PagerDuty free tier) |
| Ops overhead | $0 | $300/mo (10% of a senior engineer) |
Grafana Cloud total: about $550/month. Self-hosted total: about $800/month when engineer time is honestly counted. The apparent savings of self-hosting evaporate on smaller teams.
When Self-Hosted Wins
The math flips in favor of self-hosting only at larger scale:
- Above 200 developers: Ingest volumes push Datadog and Grafana Cloud into the $2K+/month range. Tempo on Kubernetes handles the same volume for $300/month plus fixed engineer time.
- Compliance requirements: HIPAA, FedRAMP, or SOC2 workloads sometimes mandate specific data residency that vendor tools do not support cleanly.
- Team already runs Prometheus and Grafana: Adding Tempo alongside is trivial, so the marginal ops cost is near zero.
The One Dashboard You Actually Need
Do not build ten dashboards. Build one: Cost per successful task, sliced by model and workflow. That single view answers 80% of the operational questions. Additional dashboards are worth building only when a specific question comes up repeatedly.
Alert on three things:
- Any workflow's daily cost exceeds 2x its 7-day average.
- Prompt cache hit rate on any workflow drops below 40%.
- Retry count on any workflow exceeds 3x its 7-day average.
Recommendation
- For teams under 50 developers, use Grafana Cloud. Best cost, minimal setup, complete stack.
- For teams above 200 developers or with compliance mandates, use self-hosted Tempo + Prometheus + Grafana. Costs pass through Grafana Cloud around 100 developers.
- Do not use Datadog unless you already do. It is the best UX but 3x the cost — for AI observability specifically, that premium is hard to justify.
- Custom span attributes are essential. Standard OTel
gen_aisemantic conventions cover the basics but miss cost attribution and business-unit fields. - Build one great dashboard before ten mediocre ones. Cost per successful task is the answer.
Want to calculate exact costs for your project?
Frequently Asked Questions
How much does an AI observability stack cost?
For a 20-developer team: about $550/month all-in on Grafana Cloud, or roughly $800/month self-hosted when engineer time is counted honestly. Datadog for the same workload runs $1,200+/month.
Which vendor is best for AI coding agent observability?
Grafana Cloud for teams under 50 developers — best value with a complete stack. Self-hosted Tempo + Prometheus + Grafana becomes competitive at 200+ developers. Datadog has the best UX but is 3x the cost, hard to justify for AI-specific workloads.
What custom span attributes do I need for AI observability?
Model name, token counts (input, output, cached), computed cost, agent trajectory ID, tool name, retry count, and business unit or workflow tag. Standard OpenTelemetry gen_ai semantic conventions cover the basics but miss cost attribution.
Do I need traces or are metrics enough?
Traces are essential for AI agents. Metrics tell you 'requests were slow' — traces tell you 'the router decided to use Opus, the first tool call took 8s, and we retried twice.' The parent-child span relationships let you compute cost per agent trajectory, which pure metrics cannot.
Should I build many dashboards or one great one?
One great one. Cost per successful task, sliced by model and workflow, answers 80% of operational questions. Additional dashboards are worth building only when a specific question recurs. Start minimal.
Related Articles
Claude Code v2.1.145 Adds Agent JSON and Better OTEL Traces: Why Observability Matters for AI Coding Spend
Claude Code v2.1.145 adds JSON output for agent sessions, better OpenTelemetry parent-child traces, and permission fixes. Here is why those changes matter for AI coding cost tracking.
AI Coding Cost Observability: How to Track Tokens by Agent, Tool, and Workflow
A practical guide to AI coding cost observability: track token usage by agent, tool, MCP server, workflow, pull request, and outcome.
xAI Voice Agent Builder at $0.05/Minute: A New Baseline for Voice Coding Agent Costs
xAI launched Voice Agent Builder on July 2, 2026 at $0.05 per audio minute plus $0.01 for phone. We break down what that means for developers building voice-driven coding agents, compare it to OpenAI Realtime and ElevenLabs, and share a cost model for a typical week of use.