What it is
A distributed deadlock occurs when processes on different nodes each hold a resource and wait for one held by another, forming a cycle of waiting that never resolves. It is the classic deadlock, but spread across the network where no single node sees the whole picture.
The four conditions
Deadlock needs all of these, the same as in a single machine:
- Mutual exclusion resources held exclusively
- Hold and wait a process holds one resource while waiting for another
- No preemption resources are not forcibly taken
- Circular wait a cycle exists in the wait for graph
Detection across nodes
No node has the full wait for graph, so detection is harder:
- Edge chasing sends probe messages along wait edges; if a probe returns to its origin, a cycle exists
- Centralized detection builds a global graph at a coordinator, risking a bottleneck and false cycles from stale data
Prevention and recovery
- Resource ordering so cycles cannot form
- Timeouts that abort a waiting transaction
- Wait die or wound wait schemes using timestamps to decide who aborts
Key idea
Distributed deadlock is a circular wait spread across nodes that no single node fully sees, detected by edge chasing probes and prevented by resource ordering or timestamp based abort schemes.