Why Shard
Sharding splits a dataset across multiple nodes so each node holds only a slice. This scales write throughput and storage, unlike replicas which copy everything. The key question is how to decide which row goes to which shard.
Range Sharding
Range sharding assigns contiguous key ranges to shards. Keys A to M go to shard one, N to Z to shard two.
- Pro Range scans are efficient because nearby keys sit together.
- Con Uneven distribution. If recent timestamps are the key, all new writes pile onto one shard, creating a hot spot.
Hash Sharding
Hash sharding applies a hash function to the key and uses the result to pick a shard.
- Pro Even spread. The hash scatters keys uniformly, smoothing load across shards.
- Con Range scans become expensive because adjacent keys land on different shards, so a scan must hit every shard.
Choosing
Pick range sharding when ordered scans dominate, such as time series queries over a window. Pick hash sharding when point lookups dominate and you fear hot spots. Many systems combine the two, hashing a high level key then ranging within it.
Key idea
Range sharding keeps neighbors together for scans but risks hot spots; hash sharding spreads load evenly but scatters ranges across shards.