Two different operations
People conflate refresh and merge, but they solve different problems on a segment based index.
- Refresh makes recently written segments visible to search. It is about freshness and happens often.
- Merge combines several segments into one larger segment. It is about long term efficiency and happens in the background.
Why merging is necessary
Each refresh adds a segment, and every query must consult all live segments. Too many small segments means more files to open, more postings lists to intersect, and slower queries. Merging rewrites several segments into one, which:
- reduces the number of segments a query touches,
- reclaims space from deleted documents, since deletes are only marked, not removed in place,
- improves compression because larger sorted runs pack better.
The cost and policy
A merge rewrites data, consuming disk IO and CPU, and competes with live indexing. A merge policy decides which segments to combine, usually grouping segments of similar size to bound write amplification. Deletes become real only when their host segment is merged away.
Key idea
Refresh exposes new segments for freshness while background merges fold many small segments into fewer large ones, reclaiming deletes and keeping queries fast.