The idea
Predicate pushdown moves a query's filter conditions as close to the data as possible, ideally into the storage or scan layer. Instead of reading everything and filtering in memory, the reader skips data that cannot match before it ever leaves disk.
How storage helps it
Columnar files store statistics like minimum and maximum per chunk. If a query asks for a value outside a chunk's range, the reader skips that whole chunk. Partitioned tables can skip whole directories, and indexes can skip whole files.
- Min max stats prune row groups that cannot match.
- Partition pruning drops entire partitions based on the filter.
- Dictionary filters skip chunks whose dictionary lacks the wanted value.
Why it matters
Less data read means less network, less decode, and less compute. Pushdown often turns a full scan into reading a small fraction of the file.
Key idea
Predicate pushdown sends filters down to storage so chunk statistics and partition metadata skip non matching data before it reaches compute, slashing the data read.