← Lessons

quiz vs the machine

Gold1470

System Design

Search Index Sharding

Splitting a large index across machines so search scales horizontally.

6 min read · core · beat Gold to climb

When one index is too big

A single index has limits on size and query throughput. Sharding splits the documents into pieces, each a self contained index on a different node. The common scheme is document partitioning, where each shard holds a disjoint subset of documents.

How a query runs

Search becomes scatter gather:

  • The coordinator sends the query to every shard, since any shard may hold a top result.
  • Each shard ranks its own documents and returns its best few.
  • The coordinator merges these partial lists into a final ranking.

This fan out means latency is governed by the slowest shard, so balanced shards and tail latency control matter.

A scoring subtlety

Because term statistics like document frequency are computed per shard, scores can drift between shards. Engines either accept the approximation or run a first phase to gather global statistics so ranking is consistent.

Shards are usually paired with replicas for fault tolerance and to spread read load.

Key idea

Sharding partitions documents across nodes and answers queries by scatter gather, where every shard ranks locally and a coordinator merges, bounded by the slowest shard.

Check yourself

Answer to earn rating on the learn ladder.

1. Why must a document partitioned query reach every shard?

2. What bounds the latency of a scatter gather search?

3. Why can scores drift across shards?