Vectorized Execution
Traditional engines process one row at a time, paying interpreter overhead per row. ClickHouse uses a vectorized engine that processes blocks of thousands of values from a single column together, amortizing overhead and letting the CPU stay in tight loops.
Why Blocks Are Fast
- Less overhead: one function call handles a whole block, not one row.
- SIMD: a CPU can apply one instruction to many packed values at once.
- Cache locality: a block of one column fits cleanly in cache.
- Branch friendly: uniform data keeps the pipeline predictable.
Working With Columns
Because data is columnar, a block is a slice of one column. Filters produce a mask, and later operators apply that mask to skip rejected values. Aggregations run across the block, combining partial states.
Key idea
ClickHouse processes columnar data in large blocks rather than row by row, using SIMD and cache friendly loops to scan billions of rows per second.