Hardware reorders memory
CPUs and compilers reorder loads and stores for speed. On a single thread this is invisible, but across threads a naive atomic CAS is not enough: the data a node points to might become visible after the pointer that publishes it, so a reader sees a node with garbage contents.
Acquire and release
The C and C plus plus memory model gives lock free code precise control:
- A release store guarantees that all writes before it become visible to any thread that later does an acquire load of that same location.
- An acquire load guarantees that subsequent reads see everything published before the matching release.
Pairing a release store on publish with an acquire load on read creates a happens before edge, so the reader observes the fully initialized node.
The ordering levels
- Relaxed gives atomicity only, no ordering, fine for independent counters.
- Acquire and release order around a synchronization point and are the workhorse of lock free code.
- Sequentially consistent adds a single global order of all such operations, simplest to reason about but the most expensive.
Choosing the weakest ordering that is still correct is the core skill of low level lock free engineering.
Key idea
Lock free correctness needs more than atomicity: pairing a release store on publish with an acquire load on read forms a happens before edge so readers see fully initialized data.