Following the load
Round robin ignores how busy each server actually is. Least connections fixes this by tracking the number of active connections per backend and sending each new request to whichever has the fewest.
- It adapts when some requests run long and others finish fast.
- A slow backend naturally receives fewer new requests because its connection count stays high.
- It needs the balancer to keep a live count for every server.
Why it helps
Imagine one request triggers a thirty second report while most finish in milliseconds. Round robin would keep feeding the busy server its share regardless. Least connections notices the backlog and steers new work elsewhere.
Weighted variant
Weighted least connections divides the connection count by the server weight, so a powerful node can hold more open connections before it looks busy.
Caveats
The metric is open connections, not CPU or latency. A backend can have few connections yet be thrashing. Still, connection count is a cheap, useful proxy for load.
Key idea
Least connections routes to the least busy backend by open connection count, adapting to uneven request durations far better than round robin.