← Lessons

quiz vs the machine

Gold1460

Databases

The Primary Key Choice for Sharding

A good shard key spreads load evenly and keeps related data together to avoid hot spots.

5 min read · core · beat Gold to climb

The Most Important Decision

The shard key shapes everything about a sharded system. A poor choice creates hot spots, forces expensive cross shard queries, and is painful to change later. Two goals pull against each other: spread load evenly, yet keep frequently joined data on the same shard.

What Makes a Key Good

  • High cardinality so data divides into many buckets.
  • Even access distribution so no single shard gets disproportionate traffic.
  • Locality so rows that are queried together share a shard.

Bad Choices and Why

  • An auto incrementing id as a range based shard key sends all new writes to the last shard, creating a write hot spot.
  • A timestamp does the same, concentrating today on one shard.
  • A low cardinality field like country forces uneven splits.

Hashing for Spread

Hashing the key spreads writes uniformly but destroys range locality, so range scans must hit every shard. Choosing between hash and range partitioning is a tradeoff between even load and efficient ranges.

Key idea

A strong shard key has high cardinality and even access, balancing uniform load against the locality that keeps related rows together.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is an auto incrementing id a poor range based shard key?

2. What does hashing a shard key sacrifice?