Distributed Locks With Leases
A lock inside one process is released when the holder finishes, and even a crash frees it because the process dies. A distributed lock spans many machines, so a holder can crash, hang, or be partitioned away while still appearing to own the lock. Nobody is left to release it.
The fix is a lease, a lock with a built in expiry. When a client acquires the lock it gets the resource for a bounded time, say ten seconds. If it finishes early it releases the lease. If it crashes, the lease simply expires and the resource becomes available to others. No human or watchdog has to clean up.
Leases introduce a new hazard. The lock service measures time, but the holder also measures time, and clocks drift. A holder may believe its lease is still valid while the service has already expired it and handed the lease to someone else. Now two clients think they hold the lock at once.
- Keep leases short to limit damage from a stuck holder.
- Renew before expiry with a heartbeat if the work runs long.
- Never assume the lease is still yours just because you have not slept.
Key idea
A lease is a self expiring distributed lock that survives a crashed holder, but clock drift means a holder can wrongly believe it still owns an expired lease.