Two levels of independence
Cloud regions are divided into availability zones, isolated data centers within a region that share fast low latency links. Going multi region means spreading across geographically distant regions.
- Multi AZ protects against a single data center failure: power, cooling, or network in one zone.
- Multi region protects against an entire region failing or a regional disaster, and serves users closer to home.
The latency wall
Zones in a region are close enough that synchronous replication between them is practical, keeping copies consistent with little delay. Regions are hundreds of miles apart, so synchronous replication adds heavy latency to every write. Most multi region designs therefore use asynchronous replication across regions, accepting a small replication lag.
The consistency tax
Asynchronous cross region replication means the standby region can be slightly behind. On failover you may lose the most recent writes, which sets your cross region RPO above zero. Going active active across regions raises the hard problem of resolving conflicting writes from two regions at once.
Choosing the level
Multi AZ is the default for serious services and is relatively cheap. Multi region is for the highest availability or strict geographic needs and costs far more in complexity and data transfer.
Key idea
Multi AZ cheaply survives a data center loss, while multi region survives a region loss but pays a latency and consistency tax.