The problem with hard shutdowns
When a server is taken out of rotation for a deploy or scale down, simply killing it drops every request it was handling and any new ones still arriving. Graceful connection draining removes a server cleanly, letting existing work finish while new work goes elsewhere.
The draining sequence
- The instance is marked unhealthy or out of service so the load balancer stops sending new requests to it.
- In flight requests are allowed to complete, up to a bounded drain timeout.
- Once connections close or the timeout elapses, the instance shuts down.
A good drain also sends a connection close signal so keep alive clients know to reconnect elsewhere rather than reuse a connection about to vanish. The drain timeout bounds how long a deploy waits, since a stuck request should not block shutdown forever. Draining is what lets rolling deployments and autoscaling happen without users seeing errors, making it a quiet but essential part of zero downtime operations.
Key idea
Graceful draining stops new traffic to a server, lets in flight requests finish within a timeout, then shuts down, enabling zero downtime deploys and scaling.