Health Checks And Readiness

Teaching load balancers and orchestrators when an instance can safely take traffic.

Two different questions

A health check answers a simple yes or no, but there are two distinct questions hiding inside it.

Liveness: is the process alive at all, or is it stuck and needs a restart.
Readiness: is the process able to serve requests right now, with its caches warm and dependencies reachable.

Confusing them causes outages. If you restart on a failed readiness check, a brief dependency blip will reboot every instance at once.

What a good check tests

A shallow check confirms the process responds and event loop is not blocked.
A deep check verifies critical dependencies like the database are reachable.

Deep checks are powerful but dangerous: if every instance health checks the same database, one slow database can mark the whole fleet unhealthy and remove all capacity.

Startup behavior

New instances need time to warm up. A startup grace period lets a pod boot before liveness probes start, so slow starts are not mistaken for crashes.

Key idea

Separate liveness from readiness so restarts and traffic routing react to the right signal.

Health Checks And Readiness

Two different questions

What a good check tests

Startup behavior

Key idea

Check yourself