← Lessons

quiz vs the machine

Silver1120

Machine Learning

The Memory Bandwidth Bound

When moving data, not doing math, decides how fast a kernel runs.

4 min read · intro · beat Silver to climb

Two ways to be slow

A GPU operation can be limited by how fast it computes or by how fast it moves data. A memory bandwidth bound kernel spends most of its time reading and writing memory, leaving the math units idle.

Arithmetic intensity

The key metric is arithmetic intensity: the number of floating point operations performed per byte moved from memory.

  • Low intensity operations such as adding two large vectors do little math per byte and are bandwidth bound.
  • High intensity operations such as large matrix multiplies reuse data many times and are compute bound.

The roofline view

Plotting performance against intensity gives a roofline: bandwidth limits the left side, peak compute caps the right.

What to do about it

To speed up a bandwidth bound kernel you reduce data movement rather than add math:

  • Fuse operations so intermediate results stay in fast registers.
  • Use lower precision to move fewer bytes.
  • Improve locality so reused data stays in cache.

Key idea

A memory bandwidth bound kernel is limited by data movement, so low arithmetic intensity operations speed up by moving fewer bytes, not by adding compute.

Check yourself

Answer to earn rating on the learn ladder.

1. What does arithmetic intensity measure?

2. How do you speed up a memory bandwidth bound kernel?