← Lessons

quiz vs the machine

Gold1380

Databases

Shard Key Selection

The single most consequential choice in a sharded system.

5 min read · core · beat Gold to climb

The Highest Leverage Decision

The shard key determines which shard owns each row. Because it shapes load distribution, query routing, and future flexibility, it is the hardest choice to reverse. A good key has three properties.

High Cardinality

The key must have many distinct values. A boolean or a status field with three values cannot spread data across hundreds of shards. Low cardinality caps how many shards you can ever use.

Even Distribution

Values should spread uniformly. A key dominated by one popular value, such as a country code where most users share one country, concentrates that value on a single shard regardless of how many shards exist.

Query Locality

The key should match how you query. If most queries filter by user id, sharding by user id means each query hits one shard. Shard by the wrong field and queries must scatter across all shards, erasing the benefit.

The Tension

These goals can conflict. A monotonically increasing id has high cardinality but poor write distribution because new ids cluster at the end. A common remedy is a composite key or a hashed key that preserves locality where it matters while spreading writes.

Key idea

A good shard key has high cardinality, spreads load evenly, and matches query patterns so most queries touch a single shard.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does a low cardinality shard key fail?

2. What happens when queries do not filter on the shard key?

3. Why is shard key choice hard to reverse?