← Lessons

quiz vs the machine

Platinum1830

System Design

The Phi Accrual Failure Detector

A failure detector that outputs a suspicion level instead of a yes or no verdict.

5 min read · advanced · beat Platinum to climb

Beyond up or down

A classic failure detector says a node is up or down based on a fixed timeout. But networks vary, so one timeout is either too jumpy or too slow. The phi accrual detector instead outputs a continuous suspicion value.

How phi is computed

The detector records the recent history of heartbeat arrival intervals and fits a distribution to them.

  • When a heartbeat is overdue, it computes phi, roughly the negative log probability that the node is still alive given how late the beat is.
  • A small phi means probably alive; a large phi means probably dead.
  • Each application picks its own threshold on phi to declare failure.

Why this is better

Because phi adapts to the observed network jitter, a temporarily slow link raises suspicion gently rather than triggering an instant false alarm. Different callers can apply different thresholds from the same signal.

Key idea

The phi accrual detector turns failure detection into a tunable suspicion score derived from heartbeat history, letting each application choose how aggressively to suspect a node.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the phi accrual detector output instead of a binary verdict?

2. Why does phi adapt better to a jittery network than a fixed timeout?