What Replication Lag Is
In a primary with read replicas setup, writes go to the primary and stream to replicas. Replication lag is the delay before a replica reflects a write already committed on the primary.
Why It Happens
- The replica must receive and apply the change log over the network.
- A heavy write burst can outpace how fast a replica applies it.
- A slow or single threaded apply step builds a backlog.
The Symptom
A user writes data, then immediately reads from a replica and sees the old value. This is eventual consistency showing up as a read after write surprise.
How To Cope
- Read your writes by routing a user back to the primary just after a write.
- Set a freshness bound and fall back to the primary if a replica is too far behind.
- Monitor lag as a first class metric and alert on spikes.
Key idea
Replication lag is the delay before replicas reflect committed writes, causing stale reads that read your writes routing can hide.