Sampling Strategies for Traces

Keeping a useful fraction of traces so cost stays sane without losing the interesting ones.

Why sample

Recording every span of every request is expensive in network, storage, and processing. Sampling keeps a fraction of traces. The art is keeping the ones that matter, especially errors and slow requests.

Head based sampling

The decision is made at the start of the trace, at the first service, and propagated so the whole trace is consistently kept or dropped. It is simple and cheap, but the decision happens before you know if the request will fail or be slow, so rare problems may be missed.

Tail based sampling

The decision is made after the trace completes, at a collector that buffers all spans for a request. It can keep every error and every slow trace while sampling normal traffic lightly. It is far more useful but needs memory to buffer spans and a way to gather all of them in one place.

Common policies

Probabilistic keeps a fixed percentage uniformly.
Rate limiting caps traces per second to bound cost.
Error and latency biased always keeps failures and slow requests.

Key idea

Head sampling decides cheaply up front, while tail sampling buffers a trace and keeps the errors and slow requests that matter most.

Sampling Strategies for Traces

Why sample

Head based sampling

Tail based sampling

Common policies

Key idea

Check yourself