Pinpointing the Culprit: Automated Failure Attribution in LLM Multi-Agent Systems

Large language model (LLM) multi-agent systems have become a popular approach for tackling complex tasks through collaborative interactions. However, when these systems fail, developers often face a daunting challenge: identifying which agent caused the failure and at what point in the process. Sifting through extensive interaction logs to locate the root cause is time-consuming and labor-intensive, akin to finding a needle in a haystack. This difficulty stems from the autonomous nature of agent collaboration, long information chains, and the subtlety of errors. To address this, researchers from Penn State University and Duke University, in collaboration with institutions including Google DeepMind, University of Washington, Meta, Nanyang Technological University, and Oregon State University, have introduced a novel research problem called Automated Failure Attribution. They have constructed the first benchmark dataset, Who&When, and developed several automated attribution methods. This work, accepted as a Spotlight presentation at ICML 2025, paves the way toward more reliable LLM multi-agent systems. The code and dataset are fully open-source.

The Challenge of Debugging Multi-Agent Systems

LLM-driven multi-agent systems hold immense potential across domains, but their fragility is a major concern. A single agent's error, misunderstanding between agents, or misstep in information transmission can cause the entire task to fail. Currently, debugging relies on manual, inefficient methods:

Pinpointing the Culprit: Automated Failure Attribution in LLM Multi-Agent Systems — Source: syncedreview.com

Manual log archaeology: Developers must manually comb through lengthy interaction logs to pinpoint the problem.
Reliance on expertise: Debugging depends heavily on the developer's deep understanding of the system.
No systematic tooling: There are no automated tools to identify the responsible agent or the timing of failure.

This lack of efficient debugging creates a bottleneck for system iteration and optimization. Without a way to quickly identify failure sources, improvements stall.

Why Failures Are Hard to Diagnose

Multi-agent systems involve lengthy chains of autonomous decisions. Agents may act based on incomplete or misinterpreted information, and failures often emerge from cumulative effects rather than a single obvious mistake. The interaction logs can be massive, making manual inspection impractical. Moreover, the same failure could have multiple plausible causes, requiring nuanced analysis.

Introducing Automated Failure Attribution

To overcome these challenges, the research team formally defined the problem of automated failure attribution: given a failed task execution trace, automatically identify the agent(s) responsible and the moment(s) when the failure occurred. This goes beyond simple error detection—it requires understanding causality in the multi-agent dialogue.

The Who&When Benchmark Dataset

The team created the first benchmark for this task, named Who&When. The dataset includes diverse multi-agent scenarios with detailed ground-truth labels indicating which agent deviated from expected behavior and at which step. It covers different types of failures such as logical errors, misinformation, and coordination breakdowns. This dataset enables systematic evaluation of attribution methods.

Proposed Attribution Methods

The researchers developed and tested several automated attribution approaches, ranging from lightweight heuristics to more sophisticated reasoning-based techniques. These methods analyze agent interactions, often using LLMs themselves as evaluators to identify anomalies or contradictions. The results highlight the complexity of the task—while some methods perform well, automatic attribution remains challenging, especially in long context sequences.

Significance and Impact

This work represents a foundational step toward improving the reliability of LLM multi-agent systems. By providing a benchmark and initial methods, it enables the research community to systematically tackle failure attribution. The open-source release of code and dataset invites further innovation. Potential applications include automated debugging tools, self-healing agents, and more robust system design.

Open Source and Future Work

The paper, available on arXiv, has been accepted as a Spotlight at ICML 2025. The co-first authors are Shaokun Zhang (Penn State) and Ming Yin (Duke), with contributions from researchers at the collaborating institutions. Future work may extend attribution to dynamic multi-turn interactions, integrate with real-time monitoring, and explore reinforcement learning from attribution feedback. This research opens a new avenue for enhancing the trustworthiness of AI systems.

Tags: