The illusion of safety
A distributed lock lets one client at a time access a resource. It feels safe, but a subtle failure breaks it. Suppose client A acquires the lock, then pauses for a long garbage collection or a network stall. The lock service times out the lease and grants the lock to client B. Now A wakes up, still believing it holds the lock, and writes. Two writers clash.
The fix is a fencing token
Each time the lock is granted, the service returns a monotonically increasing number called a fencing token. The client must include this token with every write to the storage system.
- A gets token 33 then pauses.
- B gets token 34 and writes successfully.
- A wakes, writes with token 33, and storage rejects it because it already saw 34.
Why the storage must check
The lock service alone cannot stop A, because A acts independently after waking. Only the resource that enforces the token can reject the stale writer. The token turns a hopeful lock into an enforced one.
Takeaway
Never trust a distributed lock by itself for correctness under pauses. Combine it with fencing tokens checked at the resource so a delayed old holder can never overwrite newer work.
Key idea
Fencing tokens are monotonic numbers checked at the storage layer so a paused or stale lock holder cannot corrupt data after losing the lock.