The optimizer reorders too
Hardware is not the only source of reordering. The compiler freely rearranges, merges, and eliminates memory accesses to optimize, as long as it preserves the behavior of a single thread running in isolation. That as if rule ignores other threads entirely.
Examples of legal transforms
- Reordering independent loads and stores to hide latency or fill pipeline slots.
- Hoisting a load out of a loop into a register if nothing in the single thread view writes it.
- Dead store elimination and merging redundant accesses.
Each is correct for one thread but can break code that another thread observes concurrently, for instance spinning forever on a flag the compiler hoisted into a register.
Stopping unwanted reordering
You constrain the compiler with the same tools that constrain hardware: atomics, volatile, fences, and lock operations. These act as optimization barriers, telling the compiler it may not move accesses across them or assume a value is unchanged. Without such a barrier, the compiler is fully entitled to optimize away your synchronization.
Key idea
Compilers reorder, hoist, and eliminate memory accesses under the single thread as if rule, so you must use atomics, volatile, or fences as barriers to preserve cross thread ordering.