← Lessons

quiz vs the machine

Platinum1830

Concurrency

The Store Buffer Forwarding Deep Dive

How write buffers speed stores yet cause the classic reordering anomaly.

6 min read · advanced · beat Platinum to climb

Hiding store latency

Writing to memory is slow, so a CPU places each store into a store buffer and lets the core continue without waiting for the value to reach cache. The buffer drains to the cache later, asynchronously, which hides write latency but introduces subtle ordering effects.

Store to load forwarding

If the same core later loads an address it just stored, it must see its own write. The core uses store to load forwarding: the load reads the pending value directly from the store buffer rather than from cache. This keeps a single thread self consistent even before the store is globally visible.

The store buffer anomaly

The famous problem appears when two threads each store one variable then load the other. Because each store sits in a private buffer not yet visible to the other core, both loads can read the old value, an outcome forbidden by sequential consistency.

  • Forwarding satisfies the local thread but not remote threads.
  • This is why store load reordering is allowed on common hardware.
  • A full fence flushes or orders the buffer to forbid this anomaly.

Key idea

Store buffers hide write latency and forward to local loads, but because buffered stores are not yet visible to other cores they permit store load reordering unless a full fence intervenes.

Check yourself

Answer to earn rating on the learn ladder.

1. What is store to load forwarding?

2. Why can both loads read old values in the store buffer anomaly?