Deterministic Replay Debugging

The reproducibility problem

A concurrency bug may appear once in a thousand runs. Without a way to reproduce it, you cannot study it under a debugger.

What replay records

Deterministic replay records the few sources of nondeterminism so a later run follows the same path:

the order threads acquired locks
the results of nondeterministic reads such as inputs and timers
the interleaving of accesses to shared memory

During replay the recorded log forces every choice, so the failing run repeats exactly.

The cost tradeoff

Recording everything is expensive. Practical systems log only the scheduling decisions and replay deterministic computation, keeping the log small while still reproducing the bug.

Key idea

Replay turns a rare nondeterministic failure into a repeatable one by logging the scheduling and input choices, then forcing those exact choices on every replay.

Deterministic Replay Debugging

The reproducibility problem

What replay records

The cost tradeoff

Key idea

Check yourself