Health Checking, Deep

Knowing who is alive

A balancer must only send traffic to backends that can serve it. Health checking continuously probes each backend and removes failing ones from the rotation.

Active versus passive

Active checks send probes on a schedule, such as an HTTP request to a health endpoint or a TCP connect. The balancer decides health from the response.
Passive checks observe real traffic: a burst of errors or timeouts marks a backend unhealthy without a dedicated probe.

Many systems combine both for fast detection with low overhead.

Thresholds and flapping

A single failed probe is noisy. Balancers use thresholds:

Unhealthy threshold: how many consecutive failures before eviction.
Healthy threshold: how many consecutive successes before return.
Interval and timeout: how often to probe and how long to wait.

These hysteresis rules prevent flapping, where a borderline backend rapidly toggles in and out.

Shallow versus deep

A shallow check confirms the process answers. A deep check verifies dependencies like the database are reachable, catching backends that are up but unable to serve. Deep checks are more honest but can cascade failures if a shared dependency blips.

Key idea