The Synchronization Cost
Parallel code rarely runs in perfect isolation. Whenever threads coordinate through locks, barriers, or shared atomics, they pay a synchronization cost that quietly adds serial time.
Where the cost hides
- A lock held by one thread forces others to wait, turning a region into serial execution.
- Contention on a hot lock or atomic creates a queue of threads and cache line bouncing between cores.
- A barrier makes every thread wait for the slowest, so any imbalance shows up here.
These costs feed directly into the serial fraction of Amdahl law, lowering the speedup ceiling. Worse, they often grow with processor count as more threads fight over the same resource.
Reducing it
- Shrink critical sections so locks are held briefly.
- Prefer lock free or per thread structures, then combine with a reduction.
- Replace global barriers with finer grained or local synchronization.
- Reduce false sharing by padding data so unrelated variables sit on different cache lines.
Key idea
Locks barriers and contention add hidden serial time that lowers the speedup ceiling, so minimize critical sections and shared coordination.