← Back to Blog

AI Observability Stack Cost: OpenTelemetry + Grafana + Custom Traces for Coding Agent Fleets

By Eric Bush · July 3, 2026 · 9 min read

A monitoring dashboard displaying real-time system metrics and traces on a workstation

Why AI Observability Is Different

Traditional application observability tracks requests, latency, and error rates. For AI coding agents, you need to track those plus:

  • Token consumption per request, session, user, and workflow.
  • Cache hit rates on prompt caching.
  • Model routing decisions (which model handled which request and why).
  • Tool call sequences within an agent trajectory.
  • Retry patterns and their cost.
  • Cost attribution to business unit or feature.

Without these, you cannot answer basic operational questions like "why did our AI bill spike last Tuesday" or "which repos consume the most tokens." This post prices out the observability stack that gets you those answers.

The Three Stack Layers

Every AI observability stack has three layers, whether you buy them separately or as an integrated product:

  1. Instrumentation. The code in your agent that emits spans, metrics, and events. OpenTelemetry is the standard.
  2. Transport and storage. Where the telemetry data goes: Grafana Cloud, Datadog, Honeycomb, New Relic, or self-hosted.
  3. Query and visualization. Dashboards, alerts, and ad-hoc queries. Grafana, Datadog, or custom.

Instrumentation Cost

Instrumentation itself is nearly free — OpenTelemetry SDKs are open source. The real cost is engineer time:

  • Basic instrumentation: Wrap Anthropic/OpenAI SDK calls to emit spans with token counts, model name, and latency. About 2–4 engineer-hours.
  • Agent trajectory tracing: Emit spans for each tool call and thought step, with parent-child relationships. 8–16 engineer-hours.
  • Cost attribution: Add user, session, workflow, and cost-center attributes to every span. 4–8 engineer-hours.
  • Ongoing maintenance: Updates as new agent features ship. 2–4 hours/month.

One-time cost: about $3,000–$6,000 in engineer time. Ongoing: about $400/month.

Transport and Storage Cost by Provider

Cost varies dramatically by ingest volume. Assume a 20-developer team generating 100 traces per developer per day, each with 30 spans averaging 400 bytes: about 24M spans/month or roughly 10 GB/month of trace data plus metrics and logs.

Vendor Monthly cost (10GB) Notes
Grafana Cloud Pro $100–$200 Includes metrics, logs, traces
Datadog APM + Logs $450–$800 Best UX but most expensive
Honeycomb $200–$400 Strong trace query capabilities
New Relic $300–$600 Consumption-based pricing
Self-hosted Tempo + Prometheus $50–$120 Requires EC2 or GKE, 2 engineer-days setup

The Custom Traces You Actually Need

Off-the-shelf observability tools capture what any HTTP client would produce. For AI coding agents you need custom span attributes:

  • llm.model — which model handled the request.
  • llm.tokens.input, llm.tokens.output, llm.tokens.cached_read — token accounting.
  • llm.cost_usd — computed cost of this specific call.
  • agent.trajectory.id — group all spans for one agent task.
  • agent.tool.name, agent.tool.arguments_size — which tools ran and with what.
  • agent.retry.count — how many times we retried this task.
  • business.workflow, business.cost_center — for chargeback.

Standard OTel semantic conventions cover some of this in the gen_ai.* namespace. Custom attributes fill the rest.

Full Stack Cost for a 20-Developer Team

Component Vendor (Grafana Cloud) Self-hosted
Storage & ingest $150/mo $85/mo (EC2 + S3)
Instrumentation dev time $400/mo amortized $400/mo amortized
Dashboards & alerts Included $15/mo (Grafana OSS + PagerDuty free tier)
Ops overhead $0 $300/mo (10% of a senior engineer)

Grafana Cloud total: about $550/month. Self-hosted total: about $800/month when engineer time is honestly counted. The apparent savings of self-hosting evaporate on smaller teams.

When Self-Hosted Wins

The math flips in favor of self-hosting only at larger scale:

  • Above 200 developers: Ingest volumes push Datadog and Grafana Cloud into the $2K+/month range. Tempo on Kubernetes handles the same volume for $300/month plus fixed engineer time.
  • Compliance requirements: HIPAA, FedRAMP, or SOC2 workloads sometimes mandate specific data residency that vendor tools do not support cleanly.
  • Team already runs Prometheus and Grafana: Adding Tempo alongside is trivial, so the marginal ops cost is near zero.

The One Dashboard You Actually Need

Do not build ten dashboards. Build one: Cost per successful task, sliced by model and workflow. That single view answers 80% of the operational questions. Additional dashboards are worth building only when a specific question comes up repeatedly.

Alert on three things:

  1. Any workflow's daily cost exceeds 2x its 7-day average.
  2. Prompt cache hit rate on any workflow drops below 40%.
  3. Retry count on any workflow exceeds 3x its 7-day average.

Recommendation

  • For teams under 50 developers, use Grafana Cloud. Best cost, minimal setup, complete stack.
  • For teams above 200 developers or with compliance mandates, use self-hosted Tempo + Prometheus + Grafana. Costs pass through Grafana Cloud around 100 developers.
  • Do not use Datadog unless you already do. It is the best UX but 3x the cost — for AI observability specifically, that premium is hard to justify.
  • Custom span attributes are essential. Standard OTel gen_ai semantic conventions cover the basics but miss cost attribution and business-unit fields.
  • Build one great dashboard before ten mediocre ones. Cost per successful task is the answer.

Want to calculate exact costs for your project?

Frequently Asked Questions

How much does an AI observability stack cost?

For a 20-developer team: about $550/month all-in on Grafana Cloud, or roughly $800/month self-hosted when engineer time is counted honestly. Datadog for the same workload runs $1,200+/month.

Which vendor is best for AI coding agent observability?

Grafana Cloud for teams under 50 developers — best value with a complete stack. Self-hosted Tempo + Prometheus + Grafana becomes competitive at 200+ developers. Datadog has the best UX but is 3x the cost, hard to justify for AI-specific workloads.

What custom span attributes do I need for AI observability?

Model name, token counts (input, output, cached), computed cost, agent trajectory ID, tool name, retry count, and business unit or workflow tag. Standard OpenTelemetry gen_ai semantic conventions cover the basics but miss cost attribution.

Do I need traces or are metrics enough?

Traces are essential for AI agents. Metrics tell you 'requests were slow' — traces tell you 'the router decided to use Opus, the first tool call took 8s, and we retried twice.' The parent-child span relationships let you compute cost per agent trajectory, which pure metrics cannot.

Should I build many dashboards or one great one?

One great one. Cost per successful task, sliced by model and workflow, answers 80% of operational questions. Additional dashboards are worth building only when a specific question recurs. Start minimal.