← Lessons

quiz vs the machine

Gold1410

System Design

Predicate Pushdown Deep Dive

Pushing filters down to the storage layer so it skips data before it ever reaches compute.

5 min read · core · beat Gold to climb

The idea

Predicate pushdown moves a query's filter conditions as close to the data as possible, ideally into the storage or scan layer. Instead of reading everything and filtering in memory, the reader skips data that cannot match before it ever leaves disk.

How storage helps it

Columnar files store statistics like minimum and maximum per chunk. If a query asks for a value outside a chunk's range, the reader skips that whole chunk. Partitioned tables can skip whole directories, and indexes can skip whole files.

  • Min max stats prune row groups that cannot match.
  • Partition pruning drops entire partitions based on the filter.
  • Dictionary filters skip chunks whose dictionary lacks the wanted value.

Why it matters

Less data read means less network, less decode, and less compute. Pushdown often turns a full scan into reading a small fraction of the file.

Key idea

Predicate pushdown sends filters down to storage so chunk statistics and partition metadata skip non matching data before it reaches compute, slashing the data read.

Check yourself

Answer to earn rating on the learn ladder.

1. What does predicate pushdown move toward storage?

2. How do min max statistics enable skipping?

3. What is the main benefit of pushdown?