Memory Ordering For Lock Free

How acquire and release semantics keep lock free publication visible and correct.

Hardware reorders memory

CPUs and compilers reorder loads and stores for speed. On a single thread this is invisible, but across threads a naive atomic CAS is not enough: the data a node points to might become visible after the pointer that publishes it, so a reader sees a node with garbage contents.

Acquire and release

The C and C plus plus memory model gives lock free code precise control:

A release store guarantees that all writes before it become visible to any thread that later does an acquire load of that same location.
An acquire load guarantees that subsequent reads see everything published before the matching release.

Pairing a release store on publish with an acquire load on read creates a happens before edge, so the reader observes the fully initialized node.

The ordering levels

Relaxed gives atomicity only, no ordering, fine for independent counters.
Acquire and release order around a synchronization point and are the workhorse of lock free code.
Sequentially consistent adds a single global order of all such operations, simplest to reason about but the most expensive.

Choosing the weakest ordering that is still correct is the core skill of low level lock free engineering.

Key idea

Lock free correctness needs more than atomicity: pairing a release store on publish with an acquire load on read forms a happens before edge so readers see fully initialized data.

Memory Ordering For Lock Free

Hardware reorders memory

Acquire and release

The ordering levels

Key idea

Check yourself