The two halves
A search system has an offline path and an online path. The offline path crawls or ingests documents, processes them, and builds an index. The online path takes a user query and returns ranked results in milliseconds.
Offline path
- Ingestion pulls in documents from crawlers, databases, or event streams.
- Processing tokenizes text, extracts fields, and computes signals like popularity.
- Index build writes an inverted index plus stored fields.
Online path
- Query parsing turns raw text into a structured query.
- Retrieval pulls candidate documents from the index.
- Ranking scores and orders candidates before returning a page.
Splitting heavy work offline keeps query latency low. The index is the contract between the two halves: the builder produces it, the server reads it.
Diagram
Key idea
Do the expensive work offline so the online path can stay fast and serve fresh ranked results.