Sharing that is not real
False sharing is a performance bug, not a correctness bug. Two threads update different variables that happen to sit on the same cache line. The hardware tracks coherence per line, not per variable, so each write invalidates the other core's copy of the whole line.
Why it hurts
The cache coherence protocol bounces the line back and forth between cores:
- Core A writes its variable and invalidates the line in core B.
- Core B writes its variable and invalidates it in core A.
- Neither thread shares data, yet they trade the line constantly, paying memory latency.
The result can be a multi fold slowdown that scales the wrong way as you add threads.
Fixing it
- Pad or align hot per thread fields to their own cache line, often sixty four bytes.
- Group read mostly data away from frequently written data.
- Use per thread local accumulators and combine at the end.
Key idea
False sharing is contention over a cache line, not real data. Pad independent hot variables onto separate lines so coherence traffic stops.