← Lessons

quiz vs the machine

Gold1480

Databases

Cross Shard Queries and Scatter Gather

Queries that miss the shard key must fan out to every shard and merge the partial results.

5 min read · core · beat Gold to climb

When the Shard Key Is Absent

If a query filters by the shard key, the router sends it to exactly one shard. But many queries do not include the shard key. To answer them, the system must ask every shard and combine the answers. This pattern is scatter gather.

How Scatter Gather Runs

  • Scatter: the query is sent to all shards in parallel.
  • Each shard runs the query against its local subset.
  • Gather: a coordinator merges the partial results into one answer.

Why It Is Expensive

  • Latency is bounded by the slowest shard, not the average.
  • A single slow or failing shard degrades every fan out query.
  • Sorting, pagination, and aggregation must be redone at the coordinator after merging.

Reducing Fan Out

  • Add a secondary index table keyed differently so lookups route to one shard.
  • Denormalize so common queries carry the shard key.
  • Maintain a separate search system for queries that inherently span shards.

Key idea

Queries lacking the shard key trigger scatter gather across all shards, with latency bounded by the slowest shard and a costly merge step.

Check yourself

Answer to earn rating on the learn ladder.

1. When must a sharded system use scatter gather?

2. What bounds the latency of a scatter gather query?