Measure do not guess
Intuition about hot code is often wrong. A CPU profiler samples the running stack many times per second, so functions that appear in more samples are using more CPU.
The flamegraph
A flamegraph turns those samples into a picture.
- Each box is a function in the call stack.
- Width shows how much total CPU time it and its callees used.
- Stacking shows caller above callee, so deep towers are deep call chains.
You read it by width not height. The widest boxes are where time is spent.
Reading clues
- A wide leaf is a function doing real work, a good optimization target.
- A wide plateau of many narrow leaves suggests overhead spread thin, like serialization or allocation.
- Surprising frames often reveal accidental work such as logging in a hot loop.
After the fix
Profile again to confirm the hot path shrank and that the bottleneck did not simply move elsewhere. Optimization without a fresh profile is guessing.
Key idea
Sample stacks into a flamegraph and optimize the widest frames, then reprofile to confirm the hot path actually shrank.