← Lessons

quiz vs the machine

Platinum1800

System Design

Multi AZ And Multi Region

Choosing how far apart your copies live and paying the latency it costs.

6 min read · advanced · beat Platinum to climb

Two levels of independence

Cloud regions are divided into availability zones, isolated data centers within a region that share fast low latency links. Going multi region means spreading across geographically distant regions.

  • Multi AZ protects against a single data center failure: power, cooling, or network in one zone.
  • Multi region protects against an entire region failing or a regional disaster, and serves users closer to home.

The latency wall

Zones in a region are close enough that synchronous replication between them is practical, keeping copies consistent with little delay. Regions are hundreds of miles apart, so synchronous replication adds heavy latency to every write. Most multi region designs therefore use asynchronous replication across regions, accepting a small replication lag.

The consistency tax

Asynchronous cross region replication means the standby region can be slightly behind. On failover you may lose the most recent writes, which sets your cross region RPO above zero. Going active active across regions raises the hard problem of resolving conflicting writes from two regions at once.

Choosing the level

Multi AZ is the default for serious services and is relatively cheap. Multi region is for the highest availability or strict geographic needs and costs far more in complexity and data transfer.

Key idea

Multi AZ cheaply survives a data center loss, while multi region survives a region loss but pays a latency and consistency tax.

Check yourself

Answer to earn rating on the learn ladder.

1. Why do most multi region designs use asynchronous replication?

2. What does multi AZ protect against that a single zone does not?

3. What is the cost of asynchronous cross region replication on failover?