← Lessons

quiz vs the machine

Platinum1800

System Design

Distributed Locks with Fencing Tokens

Why a paused lock holder can corrupt data and how monotonic tokens fence it out.

6 min read · advanced · beat Platinum to climb

The illusion of safety

A distributed lock lets one client at a time access a resource. It feels safe, but a subtle failure breaks it. Suppose client A acquires the lock, then pauses for a long garbage collection or a network stall. The lock service times out the lease and grants the lock to client B. Now A wakes up, still believing it holds the lock, and writes. Two writers clash.

The fix is a fencing token

Each time the lock is granted, the service returns a monotonically increasing number called a fencing token. The client must include this token with every write to the storage system.

  • A gets token 33 then pauses.
  • B gets token 34 and writes successfully.
  • A wakes, writes with token 33, and storage rejects it because it already saw 34.

Why the storage must check

The lock service alone cannot stop A, because A acts independently after waking. Only the resource that enforces the token can reject the stale writer. The token turns a hopeful lock into an enforced one.

Takeaway

Never trust a distributed lock by itself for correctness under pauses. Combine it with fencing tokens checked at the resource so a delayed old holder can never overwrite newer work.

Key idea

Fencing tokens are monotonic numbers checked at the storage layer so a paused or stale lock holder cannot corrupt data after losing the lock.

Check yourself

Answer to earn rating on the learn ladder.

1. What problem do fencing tokens solve?

2. What property must a fencing token have?

3. Who must actually enforce the fencing token?