← Lessons

quiz vs the machine

Platinum1860

Databases

Sharding Rebalancing

Moving data between shards without downtime or a stampede.

5 min read · advanced · beat Platinum to climb

When Shards Drift

A sharded database splits data across nodes by a partition key. Over time shards grow unevenly or you add capacity, so data must be rebalanced to keep load even. The challenge is moving partitions while serving live traffic.

Why Hashing by Node Count Fails

Mapping keys with a hash modulo the number of nodes is tempting, but adding one node changes the modulus and nearly every key moves. That triggers a massive reshuffle. Better schemes minimize movement.

Better Approaches

  • Consistent hashing places nodes and keys on a ring so adding a node moves only a small share of keys.
  • Fixed partitions create many more partitions than nodes up front, then assign whole partitions to nodes. Rebalancing just reassigns partitions, never resplits keys.
  • Move data in the background with throttling so user queries are not starved.

Key idea

Rebalancing should move as little data as possible, so consistent hashing or a fixed set of partitions beats hashing by node count, and copies run throttled in the background.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is hashing keys modulo node count a poor sharding scheme?

2. How do fixed partitions ease rebalancing?

3. How should the actual data copy run during rebalancing?