Trace Based Alerting

Triggering alerts on patterns inside traces, not just on aggregate metric thresholds.

Beyond Metric Thresholds

A classic alert fires when a metric crosses a line, like error rate above two percent. But metrics flatten away the structure of a request. Trace based alerting evaluates conditions against the trace itself.

What You Can Express

Span level conditions: alert when a specific downstream span exceeds a latency budget.
Structural conditions: alert when a trace contains an unexpected dependency or a retry storm.
Combined conditions: alert when a slow trace also has an exception event on a payment span.

These are precise in a way a single aggregate cannot be. A metric says latency rose; a trace based alert says checkout is slow specifically because the inventory call is timing out.

The Trade Offs

Evaluating every trace is expensive, so rules often run on the sampled or tail selected set. There is also a risk of noisy rules that match many traces, so conditions must be specific enough to be actionable.

Key idea

Trace based alerting evaluates rules against the structure and spans of a trace, catching precise failures that aggregate metric thresholds hide.

Trace Based Alerting

Beyond Metric Thresholds

What You Can Express

The Trade Offs

Key idea

Check yourself