Foam will be at AI Engineer in SF, June 29–July 2. Book a chat with us ›

Why AI SREs Make No Cents.

Everyone is trying to sell you an AI SRE. Connect an agent to your existing stack, it queries your logs, correlates your traces, surfaces root cause. Sounds like the last mile is finally solved.

It isn't. The economics don't work. Allow me to explain why.

Problem 1: Garbage in, garbage out. Instrumentation gaps are still an issue.

Agents are bounded by the telemetry they can observe. This is not a product or LLM limitation. An agent reasoning over incomplete traces cannot reconstruct what wasn't captured. When key telemetry is missing, not even G-d can save you.

As distributed architectures have grown more complex (multi-cloud environments, third-party API integrations, microservices) achieving high-fidelity observability requires proper and mindful instrumentation (Sapphire Ventures). Every service with missing spans, every log line stripped of context to reduce ingestion cost, every trace that stops at a service boundary: these are permanent blind spots for any agent sitting on top.

And this just isn't a distributed systems problem. Even a simple stack (one backend, one frontend) runs into limitations due to instrumentation gaps.

"Well, let me solve that problem by instrumenting well."

Problem 2: The better you instrument, the more you pay.

Let's take a look at the pricing architecture of an entire generation of observability tooling. Every major platform meters on data volume:

  • Datadog: $0.10/GB ingested plus $1.70/million events indexed at 15-day retention (source)
  • New Relic: $0.30/GB above the 100 GB/month free tier on Standard, $0.50/GB on Data Plus (source)
  • Grafana Cloud: approximately $0.55/GB effective for logs, $0.50/GB for traces (source)
  • AWS CloudWatch: $0.50/GB for Standard log ingestion, $0.25/GB for Infrequent Access, on top of separate storage and query fees (source)

The problem compounds with software scale. GitHub's 2025 Octoverse report showed a 40% increase in new repository creation year-over-year, driven almost entirely by AI-assisted development (OneUptime).

The consequence is a structural perverse incentive: fixing Problem 1 (adding richer spans, more complete traces, higher-cardinality logs to feed the agent) directly increases your bill on every one of these platforms.

And then of course you have to pay for tokens.

Problem 3: You still have to fix, monitor, and root cause production after it all.

Teams have two paths:

Option A: Pay $$ for a bolt-on agent including Datadog Bits, Resolve AI, Traversal and the dozen other solutions.

Option B: Build your own via MCP. Wire Cursor, Claude Code, or a custom agent to your observability backend via MCP.

So, now you have two bills. And sometimes three. In Option B, someone at your company will write and maintain the agent. The evals, the prompt tuning, the debugging when it hallucinates a root cause at 2am.

By the way, I have seen so many companies completely miss the underlying infrastructure cost of in-house agents and engineering salaries in their cost calculations. That part shocks me every time.

Nick Young confused

The alternative

A better way to solve the production problem (and when you think about it anything that involves reasoning and data) is to build infra that doesn't increase cost with more telemetry and the agent's query patterns are optimized at the infrastructure level rather than billed against them.

Being the conscious team we are, we designed and built Foam around this model. We've made cold storage usable and fast. Agents monitor and drive investigations, and we optimize querying for cost and speed while indexing databases intelligently.

We also built and continue to evolve auto-instrumentation agents that take your services, instrument them, and make sure gaps are covered. Today Foam within 30 minutes can install and instrument a single service once you connect your repo so you don't have to do it yourself.

The end result is we include ingestion, storage, and indexing at no additional cost and charge per investigation only, while having the most accurate results out there thanks to our instrumentation. See our pricing.