CPU Profiling and Flamegraphs

Measure do not guess

Intuition about hot code is often wrong. A CPU profiler samples the running stack many times per second, so functions that appear in more samples are using more CPU.

The flamegraph

A flamegraph turns those samples into a picture.

Each box is a function in the call stack.
Width shows how much total CPU time it and its callees used.
Stacking shows caller above callee, so deep towers are deep call chains.

You read it by width not height. The widest boxes are where time is spent.

Reading clues

A wide leaf is a function doing real work, a good optimization target.
A wide plateau of many narrow leaves suggests overhead spread thin, like serialization or allocation.
Surprising frames often reveal accidental work such as logging in a hot loop.

After the fix

Profile again to confirm the hot path shrank and that the bottleneck did not simply move elsewhere. Optimization without a fresh profile is guessing.

Key idea