Why sample
Recording every span of every request is expensive in network, storage, and processing. Sampling keeps a fraction of traces. The art is keeping the ones that matter, especially errors and slow requests.
Head based sampling
The decision is made at the start of the trace, at the first service, and propagated so the whole trace is consistently kept or dropped. It is simple and cheap, but the decision happens before you know if the request will fail or be slow, so rare problems may be missed.
Tail based sampling
The decision is made after the trace completes, at a collector that buffers all spans for a request. It can keep every error and every slow trace while sampling normal traffic lightly. It is far more useful but needs memory to buffer spans and a way to gather all of them in one place.
Common policies
- Probabilistic keeps a fixed percentage uniformly.
- Rate limiting caps traces per second to bound cost.
- Error and latency biased always keeps failures and slow requests.
Key idea
Head sampling decides cheaply up front, while tail sampling buffers a trace and keeps the errors and slow requests that matter most.