← Lessons

quiz vs the machine

Gold1410

System Design

Near Real Time Indexing

Making new documents searchable within seconds without rebuilding the whole index.

5 min read · core · beat Gold to climb

The freshness gap

A batch rebuilt index can be hours stale. Many systems need a new document searchable in seconds. The challenge is that the main on disk index is immutable for speed, so you cannot simply insert into it.

Buffer then flush

The standard design keeps a small in memory buffer that accepts new documents immediately.

  • Writes go to the buffer and an append only transaction log for durability.
  • Periodically the buffer is flushed into a new small on disk segment that becomes searchable.
  • A refresh makes the latest segment visible to queries; this is the knob that controls how fresh search feels.

A query searches the big base index plus the small recent segments together, so results include just added documents.

The trade off

Refreshing more often lowers the freshness gap but creates many tiny segments, which slows queries and forces more merging later. Operators tune the refresh interval to balance freshness against query cost.

Key idea

Near real time indexing buffers writes in memory and flushes small searchable segments on a refresh, trading more segments for fresher results.

Check yourself

Answer to earn rating on the learn ladder.

1. Why are new documents not inserted directly into the main index?

2. What is the cost of refreshing very frequently?