← Lessons

quiz vs the machine

Platinum1760

System Design

Consistent Hashing for Storage

Distributing keys across nodes so adding or removing one moves little data.

6 min read · advanced · beat Platinum to climb

The rehashing problem

A naive way to place a key on one of N storage nodes is to take the key hash modulo N. The trouble is that changing N, by adding or removing a node, changes almost every assignment, forcing a massive data reshuffle.

The ring

Consistent hashing maps both nodes and keys onto a circular hash space, a ring. A key is stored on the first node found by walking clockwise from the key position. When a node joins or leaves, only the keys between it and its neighbor move, so reshuffling is small and local.

Virtual nodes

A few physical nodes placed once each can split the ring unevenly, leaving some nodes overloaded. Assigning each physical node many virtual nodes, spread around the ring, smooths the distribution and lets bigger machines own more of the ring.

Replication on the ring

To replicate, a key is also stored on the next few distinct nodes clockwise. This places replicas on different machines while keeping placement deterministic, so any client can compute where a key lives without asking a coordinator.

Key idea

Consistent hashing places keys and nodes on a ring so that adding or removing a node moves only neighboring keys, and virtual nodes plus clockwise replicas give balanced, deterministic, low churn placement.

Check yourself

Answer to earn rating on the learn ladder.

1. What problem does consistent hashing solve over hash modulo N placement?

2. Why are virtual nodes used?

3. How are replicas placed on the ring?