Leaving without slamming the door
When you deploy, scale down, or replace a backend, abruptly cutting it off kills in-flight requests and drops user sessions. Connection draining, also called graceful removal, stops sending new traffic to a backend while letting existing requests complete.
How it works
- The backend is marked draining in the balancer.
- New connections and requests go to other backends.
- Existing connections finish naturally, up to a drain timeout.
- After the timeout, any stragglers are force closed.
Why the timeout matters
A drain timeout bounds how long you wait. Set it too short and long requests get cut; set it too long and a deploy crawls.
- Short timeout suits APIs with millisecond requests.
- Longer timeout suits uploads, streaming, or batch calls.
Coordinating with health
Draining pairs with health checks and deploy orchestration. A common pattern is to mark a node unhealthy or send a shutdown signal, wait for the drain window, then terminate. This gives zero downtime rollouts because users on the old node finish cleanly while new traffic flows elsewhere.
Key idea
Connection draining stops new traffic to a retiring backend while letting active requests finish within a drain timeout, enabling zero downtime deploys and scale down.