A hidden performance bug
False sharing happens when two threads modify different variables that happen to sit on the same cache line. Coherence works on whole lines, typically sixty four bytes, so each write invalidates the other core's copy even though the threads never touch the same data.
Why it hurts
Every write by one core forces the line into Modified there and Invalid elsewhere. The other core then suffers a coherence miss to write its own variable, bouncing the line back and forth.
- The line ping pongs between caches on every write.
- Throughput collapses though the code looks embarrassingly parallel.
- It is invisible in the source, since the variables are logically independent.
Fixing it
The cure is to keep contended variables on separate cache lines.
- Pad each hot variable so it occupies its own line.
- Align per thread data to cache line boundaries.
- Group read only data together and isolate frequently written counters.
Languages offer help such as alignas in C plus plus or padding fields, and the JDK provides an annotation that inserts padding around a field.
Key idea
False sharing arises when independent variables share a cache line and coherence bounces the line between cores, fixed by padding and aligning hot data onto separate cache lines.