Distributed Transactions

Keeping many independent services in agreement when one logical change spans all of them.

The problem

A distributed transaction is one logical unit of work that touches data living in two or more independent services or databases. The danger is a partial failure: the first service commits, the second crashes, and now the system is in a state that no single owner can fix.

We want the classic guarantee of atomicity, meaning all parts commit or none do, but there is no shared lock manager across the network to enforce it cheaply.

Why it is hard

Each store has its own local transaction and its own clock.
The network can drop, delay, or duplicate any message.
A coordinator can fail right after deciding but before telling everyone.

The two families of solutions

Synchronous agreement like two phase commit locks resources and votes before committing. It is strongly consistent but blocks if the coordinator dies.
Asynchronous compensation like the saga pattern runs local commits and undoes them with compensating actions on failure. It trades isolation for availability.

Most modern systems prefer eventual consistency through sagas because holding cross service locks hurts availability.

Key idea

A distributed transaction must turn many local commits into one all or nothing outcome despite partial failure.

Distributed Transactions

The problem

Why it is hard

The two families of solutions

Key idea

Check yourself