Transactions On Top Of A Key Value Store
The Percolator design adds multi row transactions to a store that natively offers only single row atomicity. It uses snapshot isolation with two timestamps per transaction, a start timestamp and a commit timestamp, both handed out by a central timestamp oracle so they form a global order.
Locks And The Primary
A transaction writes by placing locks alongside its data at the start timestamp.
- One row is chosen as the primary lock, and all other rows point to it.
- The transaction commits by atomically replacing the primary lock with a write record. That single row operation is the commit point: if the primary commits, the transaction committed.
- Secondary rows are cleaned up to point to their committed write, lazily if needed.
Recovering From Crashes
Because commit hinges on the primary, a client that finds a stale lock can resolve it by inspecting the primary.
- If the primary already committed, the reader completes the secondary commit.
- If the primary is still locked and expired, the reader can roll the transaction back.
This lazy cleanup means no central coordinator must survive; any later reader repairs abandoned transactions.
Key idea
Percolator builds multi row snapshot isolation on single row atomicity by committing a primary lock and letting later readers lazily repair the rest.