← Lessons

quiz vs the machine

Gold1430

System Design

CPU Profiling and Flamegraphs

Finding where CPU time really goes by sampling stacks into a flamegraph.

5 min read · core · beat Gold to climb

Measure do not guess

Intuition about hot code is often wrong. A CPU profiler samples the running stack many times per second, so functions that appear in more samples are using more CPU.

The flamegraph

A flamegraph turns those samples into a picture.

  • Each box is a function in the call stack.
  • Width shows how much total CPU time it and its callees used.
  • Stacking shows caller above callee, so deep towers are deep call chains.

You read it by width not height. The widest boxes are where time is spent.

Reading clues

  • A wide leaf is a function doing real work, a good optimization target.
  • A wide plateau of many narrow leaves suggests overhead spread thin, like serialization or allocation.
  • Surprising frames often reveal accidental work such as logging in a hot loop.

After the fix

Profile again to confirm the hot path shrank and that the bottleneck did not simply move elsewhere. Optimization without a fresh profile is guessing.

Key idea

Sample stacks into a flamegraph and optimize the widest frames, then reprofile to confirm the hot path actually shrank.

Check yourself

Answer to earn rating on the learn ladder.

1. In a flamegraph what does the width of a box represent?

2. What should you do after applying a CPU optimization?