← Lessons

quiz vs the machine

Gold1380

Concurrency

The Store Buffer and Forwarding

How pending writes hide latency and create surprising reordering.

5 min read · core · beat Gold to climb

Hiding write latency

When a core writes, gaining ownership of the line can be slow. Rather than stall, the core parks the write in a store buffer and keeps running. The write drains to cache later.

Store to load forwarding

If the same core soon reads the address it just wrote, it would be wrong to read stale cache. The core uses store forwarding to satisfy that read directly from the store buffer, so a thread always sees its own writes in order.

The reordering catch

The store buffer is private to one core. Other cores do not see a buffered write until it drains. This means:

  • A core can read a new value of X while its own write to Y is still buffered.
  • Another core may observe the read happening before the write.

This produces store load reordering, the root of classic puzzles where each thread reads a stale zero. The hardware preserves single thread correctness through forwarding but exposes reordering across threads.

Key idea

A store buffer lets a core continue past slow writes and forward them to its own loads, but because the buffer is private it allows store load reordering visible to other cores.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the purpose of store to load forwarding?

2. Why does the store buffer cause store load reordering?