When one index is too big
A single index has limits on size and query throughput. Sharding splits the documents into pieces, each a self contained index on a different node. The common scheme is document partitioning, where each shard holds a disjoint subset of documents.
How a query runs
Search becomes scatter gather:
- The coordinator sends the query to every shard, since any shard may hold a top result.
- Each shard ranks its own documents and returns its best few.
- The coordinator merges these partial lists into a final ranking.
This fan out means latency is governed by the slowest shard, so balanced shards and tail latency control matter.
A scoring subtlety
Because term statistics like document frequency are computed per shard, scores can drift between shards. Engines either accept the approximation or run a first phase to gather global statistics so ranking is consistent.
Shards are usually paired with replicas for fault tolerance and to spread read load.
Key idea
Sharding partitions documents across nodes and answers queries by scatter gather, where every shard ranks locally and a coordinator merges, bounded by the slowest shard.