← Lessons

quiz vs the machine

Silver1060

System Design

The Inverted Index

The data structure that maps words to documents so search can skip scanning everything.

4 min read · intro · beat Silver to climb

The core idea

A naive search reads every document and checks for the query word. That is a forward scan and it does not scale. An inverted index flips the relationship: instead of asking which words a document holds, it stores, for each word, the list of documents that contain it.

That per word list is called a postings list. Each entry usually holds a document id and often the positions where the word appears, which lets the engine answer phrase queries.

Why it is fast

  • The vocabulary of unique terms is small compared to total text.
  • A lookup jumps straight to one term and reads only its postings.
  • Multi word queries intersect postings lists rather than rescanning text.

What it costs

  • The index must be built ahead of time and updated as documents change.
  • It uses extra storage, though postings compress well because ids are sorted.

The inverted index is the foundation under nearly every full text search engine, from a small library to a global web index.

Key idea

An inverted index maps each term to the documents that contain it so queries read only relevant postings instead of scanning all text.

Check yourself

Answer to earn rating on the learn ladder.

1. What does an inverted index map?

2. Why is an inverted index fast for multi word queries?