The remedy for false sharing
When per thread variables share a cache line they cause false sharing. Cache line padding is the deliberate technique of placing each hot variable on its own line so writes never collide.
How padding works
- Find the cache line size, commonly sixty four bytes.
- Add filler bytes after a variable so the next hot variable starts on a fresh line.
- Or align the variable to a line boundary using language alignment features.
A frequent pattern is an array of per thread counters where each entry is padded out to a full line. Each core then writes only its own line and no invalidations bounce between cores.
The trade off
Padding costs memory. A counter that needed four bytes may now occupy sixty four. This is usually worth it for a few very hot variables but wasteful if applied everywhere, so it is reserved for contended fields identified by profiling.
Key idea
Cache line padding spaces hot variables onto separate lines to eliminate false sharing, trading some memory for fewer coherence invalidations.