Scaling Out
Sharding partitions a collection horizontally across multiple servers called shards, letting data and throughput exceed what one machine can handle. Each shard holds a subset of the documents.
The Shard Key
A shard key is the field or fields used to decide which shard a document lives on. MongoDB divides the key range into chunks and assigns chunks to shards, rebalancing as data grows.
- A good key spreads writes evenly and matches common query filters.
- A monotonically increasing key, like a timestamp, sends all new writes to one shard, a hotspot.
- Hashed shard keys distribute writes but make range queries scatter to all shards.
Routing Queries
A router process called mongos directs queries. If a query includes the shard key it goes to one shard. Without the shard key it becomes a scatter gather across every shard.
Key idea
Sharding spreads a collection across shards by a shard key, and a key that distributes writes and matches query filters avoids hotspots and scatter gather.