The Inverted Index

The data structure that maps words to documents so search can skip scanning everything.

The core idea

A naive search reads every document and checks for the query word. That is a forward scan and it does not scale. An inverted index flips the relationship: instead of asking which words a document holds, it stores, for each word, the list of documents that contain it.

That per word list is called a postings list. Each entry usually holds a document id and often the positions where the word appears, which lets the engine answer phrase queries.

Why it is fast

The vocabulary of unique terms is small compared to total text.
A lookup jumps straight to one term and reads only its postings.
Multi word queries intersect postings lists rather than rescanning text.

What it costs

The index must be built ahead of time and updated as documents change.
It uses extra storage, though postings compress well because ids are sorted.

The inverted index is the foundation under nearly every full text search engine, from a small library to a global web index.

Key idea

An inverted index maps each term to the documents that contain it so queries read only relevant postings instead of scanning all text.

The Inverted Index

The core idea

Why it is fast

What it costs

Key idea

Check yourself