← Lessons

quiz vs the machine

Platinum1820

System Design

Capacity and the Universal Scalability Law

Why adding machines stops helping and can even make a system slower.

6 min read · advanced · beat Platinum to climb

Scaling is not linear

Naively we expect that doubling the servers doubles the throughput. In practice it does not, and the universal scalability law explains why. It says capacity is limited by two forces that grow with the number of workers.

The two penalties

  • Contention is the cost of a serial part that workers must wait on, like a shared lock or a single coordinator. This is the same limit described by Amdahl, where the serial fraction caps speedup.
  • Coherence is the cost of workers keeping shared state consistent, such as cache invalidation or cross node coordination. This cost grows faster than contention and can make throughput decline as you add workers.

What this means for capacity planning

  • Because of coherence, there is a peak number of workers beyond which adding more hurts.
  • The fix is to reduce shared state and serial sections, not just buy hardware.
  • Measure real throughput at several sizes and fit the curve rather than assuming a straight line.

Key idea

The universal scalability law shows contention and coherence cap throughput, so beyond a peak more workers make a system slower, not faster.

Check yourself

Answer to earn rating on the learn ladder.

1. What are the two forces in the universal scalability law?

2. Why can adding workers make throughput decline?

3. What is the right fix when scaling stalls due to this law?