The Agent Error Recovery

How agents detect failed tool calls and recover instead of crashing or hallucinating.

Failure is normal

Tools time out, return errors, or hand back garbage. A robust agent expects this and has a recovery strategy so one bad step does not derail the whole task.

Recovery moves

Detect: read the tool result and notice an error code, empty payload, or malformed output.
Retry: call again, possibly with a small delay, for transient failures.
Repair: fix the arguments if the error names a bad parameter.
Reroute: switch to a different tool that can answer the same need.
Escalate: ask a human or report the blocker when nothing works.

Feeding the error back

The key trick is to put the error message into the context as an observation. The model then reasons about it and chooses the next move, rather than ignoring the failure.

Pitfalls

Blind retries on a permanent error waste budget.
A retry cap prevents infinite loops.
The agent must distinguish a real error from a valid but unexpected result.

Treating errors as observations to reason about, not exceptions to swallow, is what makes an agent resilient.

Key idea