← Lessons

quiz vs the machine

Platinum1850

Machine Learning

GPU Memory and the Roofline Model

Decide whether a kernel is limited by compute or by memory bandwidth.

6 min read · advanced · beat Platinum to climb

Two limits on speed

A GPU kernel is bounded by one of two resources: how fast it can do math, called compute, or how fast it can move data to and from memory, called memory bandwidth. Knowing which one binds a kernel tells you what to optimize.

Arithmetic intensity

The deciding quantity is arithmetic intensity, the number of math operations performed per byte moved from memory.

  • Low intensity kernels, like an element wise add, move lots of data per flop and are memory bound.
  • High intensity kernels, like a large matrix multiply, reuse data heavily and are compute bound.

The roofline

Plot achievable performance against arithmetic intensity. The result is two limits forming a roof.

  • A sloped line set by memory bandwidth bounds low intensity kernels.
  • A flat ceiling set by peak compute bounds high intensity kernels.
  • The ridge point where they meet marks the intensity needed to saturate compute.

A kernel sitting under the sloped part is memory bound, so better data reuse or fusing operations helps more than faster math units. This is why techniques that cut memory traffic, such as kernel fusion and flash attention, give large speedups on memory bound work.

Key idea

The roofline model uses arithmetic intensity to classify a kernel as memory bound or compute bound; memory bound kernels improve most from reducing data movement, not from faster arithmetic.

Check yourself

Answer to earn rating on the learn ladder.

1. What does arithmetic intensity measure?

2. A kernel sits under the sloped part of the roofline. What is it bound by?

3. For a memory bound kernel, what helps most?