← Lessons

quiz vs the machine

Platinum1800

System Design

Distributed Locks for Singleton Jobs

Ensure only one instance of a job runs using a lease and fencing token.

6 min read · advanced · beat Platinum to climb

One at a Time

Some jobs must never run concurrently: a migration, a cache rebuild, or a single writer to an external system. With many workers, you need a distributed lock so only one holder runs the job at a time.

Lease, Not a Lock Forever

A holder that crashes while owning a permanent lock would block the job forever. So locks are leases with an expiry. If the holder does not renew, the lease expires and another worker can take over. A live holder renews periodically to keep working.

The Split Brain Risk

Leases create a danger. A holder pauses, for example a long garbage collection, past its expiry. Another worker acquires the lease. Now two workers believe they hold it and may both write.

Fencing Tokens

Defend with a fencing token: a number that increases each time the lease is granted. The protected resource records the highest token it has seen and rejects any write carrying a lower token. The stale holder thus gets fenced out even if it wakes up and tries to write.

Keep the Critical Section Small

Hold the lock only around the truly exclusive part. Long held locks reduce availability and raise the chance of expiry mid run.

Key idea

A singleton job uses a lease with renewal so a crashed holder is replaced, and a fencing token blocks a stale holder from corrupting the resource.

Check yourself

Answer to earn rating on the learn ladder.

1. Why are distributed locks usually implemented as leases with expiry?

2. What problem does a fencing token solve?

3. Why keep the critical section under the lock small?