Reconstructing Rows Late
In a column store a logical row is scattered across columns. Early materialization stitches full rows together up front, then filters them. Late materialization keeps columns separate, applies filters on the few needed columns first, and only fetches the other columns for the rows that survive.
Why Delay Helps
- Less IO: if a filter keeps one percent of rows, you fetch the wide columns for only that one percent.
- Operate on compressed data: filters can run on encoded columns without decoding everything.
- Smaller intermediates: position lists of surviving rows are tiny compared to full rows.
How It Flows
The engine scans the filter columns, produces a list of matching row positions, and uses those positions to gather the remaining columns. Reconstruction happens at the very end, only for output rows.
Key idea
Late materialization filters on the minimal columns first and reconstructs full rows only for survivors, avoiding the IO of stitching rows that a filter would have discarded.