Random with a twist
Pure random assignment is simple but lets imbalances grow; by chance one server can accumulate a long queue. Checking every server's load on each request is accurate but expensive at scale.
Power of two choices is the middle path: pick two backends at random, then send the request to whichever is less loaded.
Why two is enough
The surprising result is that comparing just two samples shrinks the worst-case queue length dramatically compared to one random pick.
- One random choice gives a maximum load that grows with the log of the number of servers.
- Two choices makes the maximum grow with the log of the log, a huge reduction.
- Adding a third or fourth sample helps only marginally.
Practical use
It is cheap because the balancer inspects only two servers, not all of them, so it scales to large fleets and distributed balancers that share no global state.
- Load can be measured as open connections or queue depth.
- It avoids the herd effect where many balancers all rush the single least loaded server.
Key idea
Sampling two random backends and choosing the lighter one yields nearly even load at tiny cost, capturing most of the benefit of checking every server.