← Lessons

quiz vs the machine

Silver1080

System Design

Heartbeating and Timeouts

Periodic liveness signals and the timeout logic that decides a peer is gone.

4 min read · intro · beat Silver to climb

Detecting that a node is alive

A heartbeat is a small message a node sends on a regular interval to say it is still alive. A peer that stops hearing heartbeats eventually concludes the sender has failed.

The two knobs

  • Heartbeat interval: how often beats are sent. Shorter means faster detection but more traffic.
  • Timeout: how long to wait before declaring a peer dead. It must be a few intervals so a single lost beat does not cause a false alarm.

The unavoidable trade

You cannot have both fast detection and few false positives. A short timeout catches failures quickly but misjudges slow networks; a long timeout is safe but sluggish. Tuning balances these for your environment.

Key idea

Heartbeats are periodic liveness pings, and the timeout sets a trade between detecting failures fast and avoiding false alarms from slow networks.

Check yourself

Answer to earn rating on the learn ladder.

1. Why should the timeout be longer than a single heartbeat interval?

2. What trade does the timeout length control?