Read Replicas for Scale

Offloading Reads

Many workloads read far more than they write. A read replica is a copy of the primary database that serves read queries, letting the primary focus on writes. The application sends writes to the primary and spreads reads across one or more replicas.

The primary streams its changes to each replica. Replicas apply those changes in order, staying close to current but not always exactly current.

Replication Lag

Because replication is usually asynchronous, a replica can fall behind the primary by milliseconds or more. This replication lag creates a subtle trap: a user writes a value, then immediately reads from a replica that has not yet applied the write, and sees stale data.

Common fixes:

Read your own writes by routing a user back to the primary for a short window after they write.
Bounded staleness by refusing replicas that lag beyond a threshold.

What It Does Not Solve

Replicas multiply read capacity, but every replica still holds the full dataset and must apply every write. So replicas do not help write throughput and do not reduce storage per node. When writes or data size become the bottleneck, you need sharding instead.

Key idea

Read replicas scale read capacity by copying data, but asynchronous lag risks stale reads and they do nothing for write throughput.

Read Replicas for Scale

Offloading Reads

Replication Lag

What It Does Not Solve

Key idea

Check yourself