← Lessons

quiz vs the machine

Silver1090

Concurrency

Deterministic Replay Debugging

Record the nondeterministic choices once, then replay the exact same buggy run.

4 min read · intro · beat Silver to climb

The reproducibility problem

A concurrency bug may appear once in a thousand runs. Without a way to reproduce it, you cannot study it under a debugger.

What replay records

Deterministic replay records the few sources of nondeterminism so a later run follows the same path:

  • the order threads acquired locks
  • the results of nondeterministic reads such as inputs and timers
  • the interleaving of accesses to shared memory

During replay the recorded log forces every choice, so the failing run repeats exactly.

The cost tradeoff

Recording everything is expensive. Practical systems log only the scheduling decisions and replay deterministic computation, keeping the log small while still reproducing the bug.

Key idea

Replay turns a rare nondeterministic failure into a repeatable one by logging the scheduling and input choices, then forcing those exact choices on every replay.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is deterministic replay useful for concurrency bugs?

2. What do practical replay systems log to stay cheap?