Memory Barriers and Fences
Modern processors and compilers reorder memory operations to run faster. A store may be delayed in a buffer, a load may be hoisted, and two unrelated writes may become visible to other cores in a different order than the program text. A memory barrier, also called a fence, is an instruction that forbids certain reorderings across it.
Barriers come in flavors based on what they order:
- Load load Loads before the barrier complete before loads after it.
- Store store Stores before the barrier become visible before stores after it.
- Load store and store load order the mixed cases, with store load being the strongest and most expensive.
- Full barrier Orders every combination at once.
A common use is the publish pattern. A thread fills in a data structure, issues a store store barrier, then sets a ready flag. A reader sees the flag, issues a load load barrier, then reads the data. The barriers guarantee the data writes are visible before the flag, so the reader never sees a half built structure.
Barriers are expensive because they stall the pipeline and drain buffers, so they are used sparingly. Higher level atomics with acquire and release semantics package the needed barriers so most programmers never write raw fences.
Key idea
Memory barriers forbid specific reorderings of loads and stores, so threads can publish and observe data in the intended order despite hardware and compiler reordering.