← Lessons

quiz vs the machine

Gold1360

Databases

Backfill Strategies

Populating a new column or table for existing rows without overloading the database.

5 min read · core · beat Gold to climb

Filling The Past

When you add a column or new table, new writes populate it, but existing rows stay empty. A backfill populates those historical rows. The challenge is doing it on a large table without locking it or saturating the database.

Batched Processing

The safe approach processes rows in bounded batches rather than one giant statement.

  • Walk the table by primary key ranges, a few thousand rows at a time.
  • Update each batch in its own short transaction.
  • Pause briefly between batches to let normal traffic breathe.

A single update over millions of rows would hold locks and bloat the transaction log, so batching keeps each step small and resumable.

Idempotence And Throttling

A good backfill is idempotent, so a re run skips rows already filled, letting it resume after interruption. It also throttles based on replication lag or load: if replicas fall behind, it slows down. Tracking a cursor of the last processed key lets the job stop and restart safely.

Key idea

Backfills populate historical rows in small resumable batches ordered by key, throttling on load and staying idempotent so they never lock the table or overwhelm replicas.

Check yourself

Answer to earn rating on the learn ladder.

1. Why are backfills done in small batches instead of one statement?

2. Why should a backfill be idempotent?

3. What signal commonly throttles a backfill?