When Spans Are Not Enough
A trace tells you a span took 200 milliseconds, but not why inside the code. Distributed profiling drills below the span to the function and line level, across many services.
Sampling the Stack
A profiler periodically captures the call stack of running threads. Aggregate thousands of these samples and you get a statistical picture of where CPU time goes, usually shown as a flame graph where width is time spent.
- Low overhead: sampling a few times per second is cheap.
- Statistical: hot functions appear wide because they are caught often.
Tying It to Traces
The real power comes from linking profiles to traces. You find a slow span, then open its profile to see the exact functions that consumed its time. Tracing localizes the slow service and span; profiling localizes the slow code.
Key idea
Distributed profiling samples call stacks to show which functions burn CPU, and linking it to traces takes you from a slow span down to the exact hot code.