Scaling is not linear
Naively we expect that doubling the servers doubles the throughput. In practice it does not, and the universal scalability law explains why. It says capacity is limited by two forces that grow with the number of workers.
The two penalties
- Contention is the cost of a serial part that workers must wait on, like a shared lock or a single coordinator. This is the same limit described by Amdahl, where the serial fraction caps speedup.
- Coherence is the cost of workers keeping shared state consistent, such as cache invalidation or cross node coordination. This cost grows faster than contention and can make throughput decline as you add workers.
What this means for capacity planning
- Because of coherence, there is a peak number of workers beyond which adding more hurts.
- The fix is to reduce shared state and serial sections, not just buy hardware.
- Measure real throughput at several sizes and fit the curve rather than assuming a straight line.
Key idea
The universal scalability law shows contention and coherence cap throughput, so beyond a peak more workers make a system slower, not faster.