← Lessons

quiz vs the machine

Platinum1780

System Design

Consistent Snapshots Chandy Lamport

Capture a global state without stopping the system.

6 min read · advanced · beat Platinum to climb

The challenge

How do you record a consistent global snapshot of a running distributed system when there is no shared clock and messages are in flight? The Chandy Lamport algorithm does this without pausing the system.

The marker mechanism

The algorithm uses a special marker message on FIFO channels:

  • An initiator records its own state, then sends a marker on every outgoing channel
  • When a node first sees a marker, it records its own state and sends markers out
  • A node records messages arriving on a channel between recording its state and receiving the marker on that channel; these are the in flight messages

Why it is consistent

The collected snapshot is a state the system could have been in, even if it was never simultaneously true everywhere. Channel states capture messages that were sent before the cut but not yet received, so nothing is lost or double counted.

Where it is used

  • Checkpointing for crash recovery
  • Deadlock and termination detection
  • Stream processing systems like Flink use a marker based variant for exactly once state

Assumptions

  • Channels are reliable and FIFO
  • Markers eventually propagate to all nodes

Key idea

Chandy Lamport records a consistent global snapshot by flooding marker messages on FIFO channels, capturing each node state plus the in flight messages, all without halting the system.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a node do when it first receives a marker?

2. What channel assumption does Chandy Lamport rely on?

3. Why is the captured snapshot called consistent?