← Lessons

quiz vs the machine

Platinum1850

Databases

Late Materialization

Delaying row reconstruction until filters have cut the rows.

6 min read · advanced · beat Platinum to climb

Reconstructing Rows Late

In a column store a logical row is scattered across columns. Early materialization stitches full rows together up front, then filters them. Late materialization keeps columns separate, applies filters on the few needed columns first, and only fetches the other columns for the rows that survive.

Why Delay Helps

  • Less IO: if a filter keeps one percent of rows, you fetch the wide columns for only that one percent.
  • Operate on compressed data: filters can run on encoded columns without decoding everything.
  • Smaller intermediates: position lists of surviving rows are tiny compared to full rows.

How It Flows

The engine scans the filter columns, produces a list of matching row positions, and uses those positions to gather the remaining columns. Reconstruction happens at the very end, only for output rows.

Key idea

Late materialization filters on the minimal columns first and reconstructs full rows only for survivors, avoiding the IO of stitching rows that a filter would have discarded.

Check yourself

Answer to earn rating on the learn ladder.

1. What does late materialization delay?

2. Why does delaying reconstruction reduce IO?

3. What small intermediate does late materialization pass between steps?