Why normal balancing breaks
A load balancer for HTTP spreads short requests, so any imbalance corrects itself within seconds. A persistent connection stays on one node for hours, so a single bad placement decision lasts the whole session.
The consequences
- A node that gets a burst of connections during a deploy stays hot long after.
- Round robin counts connections, not work, so a node full of chatty clients overloads while a quiet node idles.
- Restarting a node drops all its connections at once, causing a reconnect surge.
Balancing strategies
- Balance on active connection count rather than request count.
- Drain a node before deploy so its clients reconnect gradually across the fleet.
- Let clients reconnect with jitter so a node restart does not refill one node instantly.
Key idea
Balancing long lived connections requires counting active connections and draining nodes gracefully, because a single placement decision sticks for the entire session rather than self correcting like short requests.