The Most Important Decision
The shard key shapes everything about a sharded system. A poor choice creates hot spots, forces expensive cross shard queries, and is painful to change later. Two goals pull against each other: spread load evenly, yet keep frequently joined data on the same shard.
What Makes a Key Good
- High cardinality so data divides into many buckets.
- Even access distribution so no single shard gets disproportionate traffic.
- Locality so rows that are queried together share a shard.
Bad Choices and Why
- An auto incrementing id as a range based shard key sends all new writes to the last shard, creating a write hot spot.
- A timestamp does the same, concentrating today on one shard.
- A low cardinality field like country forces uneven splits.
Hashing for Spread
Hashing the key spreads writes uniformly but destroys range locality, so range scans must hit every shard. Choosing between hash and range partitioning is a tradeoff between even load and efficient ranges.
Key idea
A strong shard key has high cardinality and even access, balancing uniform load against the locality that keeps related rows together.