Seeing inside the loop
When an agent fails, you need to know why. Observability captures the full record of a run, every prompt, thought, tool call, and observation, so you can replay and debug it.
What to capture
- Traces the ordered span of steps from goal to final answer
- Tool inputs and outputs exact arguments and returned results
- Token and cost metrics how much each step consumed
- Errors and retries what failed and how the agent recovered
A trace structure
Each run is a tree of spans. A top level span holds the whole task, and nested spans capture each model call and tool execution.
Why it matters
Agents are nondeterministic, so a bug may appear in one run and vanish in the next. Without stored traces you cannot reproduce or explain failures. Good observability also surfaces silent regressions, like rising token cost or a tool quietly returning errors that the agent ignores.
Key idea
Observability records the full trace of every agent run, prompts, tool calls, and metrics, so nondeterministic failures can be reproduced, explained, and fixed.