The freshness gap
A batch rebuilt index can be hours stale. Many systems need a new document searchable in seconds. The challenge is that the main on disk index is immutable for speed, so you cannot simply insert into it.
Buffer then flush
The standard design keeps a small in memory buffer that accepts new documents immediately.
- Writes go to the buffer and an append only transaction log for durability.
- Periodically the buffer is flushed into a new small on disk segment that becomes searchable.
- A refresh makes the latest segment visible to queries; this is the knob that controls how fresh search feels.
A query searches the big base index plus the small recent segments together, so results include just added documents.
The trade off
Refreshing more often lowers the freshness gap but creates many tiny segments, which slows queries and forces more merging later. Operators tune the refresh interval to balance freshness against query cost.
Key idea
Near real time indexing buffers writes in memory and flushes small searchable segments on a refresh, trading more segments for fresher results.