Distributed Global Secondary Indexes

The Problem

In a sharded store, the shard key routes lookups efficiently. But applications often query by other fields, like email when data is sharded by user id. A secondary index lets you do that, and in a distributed system there are two ways to build one.

Local Secondary Index

Each shard indexes only its own rows. The index is partitioned by the shard key of the base data.

Writes stay local because a row and its index entry sit on the same shard.
Reads by the secondary field must scatter to every shard, since matching rows could be anywhere. This is also called a scatter gather read.

Global Secondary Index

The index is partitioned by the indexed field itself, spread across shards independently of the base data.

Reads by that field hit only the shard owning that index range, so they are targeted and fast.
Writes become harder. A single base write may need to update an index entry on a different shard, splitting one logical write across nodes and raising consistency questions.

The Tradeoff

Local indexes make writes cheap and reads broad. Global indexes make reads targeted and writes distributed. Many systems keep global indexes eventually consistent, updating them asynchronously so the base write stays fast at the cost of a brief lag.

Key idea