Layers of storage
A modern machine stores data at several levels, each larger but slower than the one above:
- Registers are inside the core and the fastest.
- L1 cache is small and takes a few cycles.
- L2 and L3 caches are larger and take tens of cycles.
- Main memory takes hundreds of cycles.
Orders of magnitude
The jump between levels is large. An L1 hit may cost a few cycles while a trip to main memory can cost two orders of magnitude more. Because of this gap, a program that misses cache often is bound by memory latency, not by arithmetic.
Why locality wins
Caches hold recently used and nearby data. Code that reuses values and walks memory in order keeps work in the fast levels. This is why sequential access and reuse, known as temporal and spatial locality, matter so much for concurrent and serial code alike.
Key idea
Each step down the memory hierarchy is much slower, so keeping hot data in cache through locality often matters more than raw computation.