← Lessons

quiz vs the machine

Platinum1820

Databases

Online Schema Change Distributed

Rolling a schema across nodes safely using intermediate states.

6 min read · advanced · beat Platinum to climb

Changing Shape While Live

Adding a column or index to a distributed table is risky: different nodes learn about the change at different moments. If one node enforces a new constraint while another does not, data can become inconsistent. The solution is a staged rollout through intermediate schema states.

The Intermediate States

Inspired by the Google F1 schema change protocol, a change moves through states like delete only and write only before becoming public.

  • Delete only: the new index can be deleted from but not written or read.
  • Write only: writes maintain the index but reads ignore it.
  • A backfill then populates existing rows.
  • Public: the index is fully usable.

Two adjacent states are always compatible, so nodes lagging by one state can never corrupt data.

Why It Is Safe

By guaranteeing that the cluster only ever spans two adjacent compatible states at once, the protocol lets the change propagate node by node without a global stop. No long table lock is needed, so the database stays online.

Key idea

Online schema change rolls through compatible intermediate states like delete only and write only, so nodes lagging by one state never corrupt data and the table stays available.

Check yourself

Answer to earn rating on the learn ladder.

1. Why are intermediate schema states needed in a distributed database?

2. In the write only state, what happens to the new index?

3. What does the protocol guarantee about states present in the cluster at once?