Sampling Traces

Tracing every request in a busy system would produce a crushing volume of data. Sampling keeps only a fraction of traces while still giving useful insight. The question is which fraction and how to choose it.

Head based sampling

In head based sampling the decision is made when the request starts, before the outcome is known. A simple rule keeps, say, one in a hundred traces. It is cheap and easy because the choice rides along in the context. The weakness is that rare errors and slow requests are usually discarded along with everything else.

Tail based sampling

In tail based sampling the system buffers spans and decides after the trace finishes. Now it can keep the interesting ones, such as every error or any request slower than a threshold, while dropping routine fast successes. This catches the traces you actually want but needs memory to hold spans until the trace completes.

Choosing

Head based is simple and cheap but blind to outcomes
Tail based is smarter but costs buffering and coordination
Many systems combine both, sampling broadly at the head and forcing keeps for errors at the tail

Key idea

Head sampling decides cheaply at the start but misses outliers, while tail sampling buffers and keeps the errors and slow traces that matter.

Sampling Traces

Sampling traces

Head based sampling

Tail based sampling

Choosing

Key idea

Check yourself