← Lessons

quiz vs the machine

Silver1100

System Design

Health Checks And Readiness

Teaching load balancers and orchestrators when an instance can safely take traffic.

4 min read · intro · beat Silver to climb

Two different questions

A health check answers a simple yes or no, but there are two distinct questions hiding inside it.

  • Liveness: is the process alive at all, or is it stuck and needs a restart.
  • Readiness: is the process able to serve requests right now, with its caches warm and dependencies reachable.

Confusing them causes outages. If you restart on a failed readiness check, a brief dependency blip will reboot every instance at once.

What a good check tests

  • A shallow check confirms the process responds and event loop is not blocked.
  • A deep check verifies critical dependencies like the database are reachable.

Deep checks are powerful but dangerous: if every instance health checks the same database, one slow database can mark the whole fleet unhealthy and remove all capacity.

Startup behavior

New instances need time to warm up. A startup grace period lets a pod boot before liveness probes start, so slow starts are not mistaken for crashes.

Key idea

Separate liveness from readiness so restarts and traffic routing react to the right signal.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a readiness check decide?

2. Why can a deep dependency check be dangerous?