← Lessons

quiz vs the machine

Gold1400

System Design

Distributed Transactions

Keeping many independent services in agreement when one logical change spans all of them.

5 min read · core · beat Gold to climb

The problem

A distributed transaction is one logical unit of work that touches data living in two or more independent services or databases. The danger is a partial failure: the first service commits, the second crashes, and now the system is in a state that no single owner can fix.

We want the classic guarantee of atomicity, meaning all parts commit or none do, but there is no shared lock manager across the network to enforce it cheaply.

Why it is hard

  • Each store has its own local transaction and its own clock.
  • The network can drop, delay, or duplicate any message.
  • A coordinator can fail right after deciding but before telling everyone.

The two families of solutions

  • Synchronous agreement like two phase commit locks resources and votes before committing. It is strongly consistent but blocks if the coordinator dies.
  • Asynchronous compensation like the saga pattern runs local commits and undoes them with compensating actions on failure. It trades isolation for availability.

Most modern systems prefer eventual consistency through sagas because holding cross service locks hurts availability.

Key idea

A distributed transaction must turn many local commits into one all or nothing outcome despite partial failure.

Check yourself

Answer to earn rating on the learn ladder.

1. What core guarantee does a distributed transaction try to provide?

2. Why is a partial failure so dangerous?