The problem
A distributed transaction is one logical unit of work that touches data living in two or more independent services or databases. The danger is a partial failure: the first service commits, the second crashes, and now the system is in a state that no single owner can fix.
We want the classic guarantee of atomicity, meaning all parts commit or none do, but there is no shared lock manager across the network to enforce it cheaply.
Why it is hard
- Each store has its own local transaction and its own clock.
- The network can drop, delay, or duplicate any message.
- A coordinator can fail right after deciding but before telling everyone.
The two families of solutions
- Synchronous agreement like two phase commit locks resources and votes before committing. It is strongly consistent but blocks if the coordinator dies.
- Asynchronous compensation like the saga pattern runs local commits and undoes them with compensating actions on failure. It trades isolation for availability.
Most modern systems prefer eventual consistency through sagas because holding cross service locks hurts availability.
Key idea
A distributed transaction must turn many local commits into one all or nothing outcome despite partial failure.