Leaving gracefully
When a backend is being shut down, deployed, or scaled in, you do not want to kill it the instant it stops being needed. In flight requests would fail. Connection draining, also called graceful shutdown, lets a backend finish its current work before it disappears.
How draining works
The load balancer marks the backend as draining:
- It stops sending new requests to that instance.
- It lets existing requests run to completion, up to a timeout.
- Once active requests finish or the timeout expires, the instance is fully removed.
Why the timeout matters
Draining needs a bound. A backend with a stuck long lived request should not block a deploy forever, so there is a maximum drain time after which remaining connections are closed. Setting it too short cuts off legitimate requests; too long slows rollouts. Teams tune it to the typical request duration plus a margin.
This pattern is what makes rolling deploys and autoscaling safe: instances come and go without users seeing reset connections.
Key idea
Connection draining stops new traffic to a leaving backend while letting in flight requests finish, bounded by a timeout for safe rollouts.