One at a Time
Some jobs must never run concurrently: a migration, a cache rebuild, or a single writer to an external system. With many workers, you need a distributed lock so only one holder runs the job at a time.
Lease, Not a Lock Forever
A holder that crashes while owning a permanent lock would block the job forever. So locks are leases with an expiry. If the holder does not renew, the lease expires and another worker can take over. A live holder renews periodically to keep working.
The Split Brain Risk
Leases create a danger. A holder pauses, for example a long garbage collection, past its expiry. Another worker acquires the lease. Now two workers believe they hold it and may both write.
Fencing Tokens
Defend with a fencing token: a number that increases each time the lease is granted. The protected resource records the highest token it has seen and rejects any write carrying a lower token. The stale holder thus gets fenced out even if it wakes up and tries to write.
Keep the Critical Section Small
Hold the lock only around the truly exclusive part. Long held locks reduce availability and raise the chance of expiry mid run.
Key idea
A singleton job uses a lease with renewal so a crashed holder is replaced, and a fencing token blocks a stale holder from corrupting the resource.