Filling The Past
When you add a column or new table, new writes populate it, but existing rows stay empty. A backfill populates those historical rows. The challenge is doing it on a large table without locking it or saturating the database.
Batched Processing
The safe approach processes rows in bounded batches rather than one giant statement.
- Walk the table by primary key ranges, a few thousand rows at a time.
- Update each batch in its own short transaction.
- Pause briefly between batches to let normal traffic breathe.
A single update over millions of rows would hold locks and bloat the transaction log, so batching keeps each step small and resumable.
Idempotence And Throttling
A good backfill is idempotent, so a re run skips rows already filled, letting it resume after interruption. It also throttles based on replication lag or load: if replicas fall behind, it slows down. Tracking a cursor of the last processed key lets the job stop and restart safely.
Key idea
Backfills populate historical rows in small resumable batches ordered by key, throttling on load and staying idempotent so they never lock the table or overwhelm replicas.