All research

Measuring root cause accuracy from telemetry

Open benchmark for the code and data behind this result.

We published a benchmark for one question we believe every debugging agent should answer cleanly: what caused the production failure?

The benchmark evaluates root cause analysis from telemetry. It is not a log summarization task, and it is not a test of whether an agent can name a nearby service. The expected answer is the kind of answer an engineer can act on: the failing behavior, the relevant signal, and the fault that explains both.

On this benchmark, Foam reaches 86% RCA accuracy. Cursor with a standard MCP reaches 41%. Cursor with the Foam MCP reaches 64%.

The difference is not only model quality. It is the shape of the context. Agents are materially better when telemetry is presented as an investigation surface instead of as raw exhaust.

Bar chart comparing root cause accuracy: Cursor plus MCP at 41 percent, Cursor plus Foam MCP at 64 percent, and Foam at 86 percent.
Root cause accuracy on the public Foam RCA benchmark.

We use this benchmark as a product instrument. If Foam improves here, it means the system is getting better at the work customers actually ask it to do: isolate the fault, explain the evidence, and reduce the time between incident and fix.

Foam Research·