← Lessons

quiz vs the machine

Platinum1760

System Design

Index Refresh and Merge

How immutable segments are made visible and later combined to keep search fast.

6 min read · advanced · beat Platinum to climb

Two different operations

People conflate refresh and merge, but they solve different problems on a segment based index.

  • Refresh makes recently written segments visible to search. It is about freshness and happens often.
  • Merge combines several segments into one larger segment. It is about long term efficiency and happens in the background.

Why merging is necessary

Each refresh adds a segment, and every query must consult all live segments. Too many small segments means more files to open, more postings lists to intersect, and slower queries. Merging rewrites several segments into one, which:

  • reduces the number of segments a query touches,
  • reclaims space from deleted documents, since deletes are only marked, not removed in place,
  • improves compression because larger sorted runs pack better.

The cost and policy

A merge rewrites data, consuming disk IO and CPU, and competes with live indexing. A merge policy decides which segments to combine, usually grouping segments of similar size to bound write amplification. Deletes become real only when their host segment is merged away.

Key idea

Refresh exposes new segments for freshness while background merges fold many small segments into fewer large ones, reclaiming deletes and keeping queries fast.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the difference between refresh and merge?

2. When do deleted documents actually free space?

3. Why do too many small segments hurt queries?