Rows vs Columns
A row store keeps all fields of a record together on disk. A column store keeps each column in its own contiguous region. Analytical queries usually read a few columns out of many, so reading only the needed columns avoids loading irrelevant bytes.
Why It Helps Analytics
- Less IO: a query touching three of fifty columns reads only those three.
- Better compression: values in one column share a type and range, so encodings like dictionary and run length shrink them far more than mixed row data.
- Vectorized execution: tight arrays of one type let the CPU process many values per instruction.
- Cache friendly: scanning a single column streams predictable memory.
The Tradeoff
Columnar layout is slow for single row writes and point lookups, because one record is spread across many files. That is why row stores still win for transactional workloads.
Key idea
Columnar storage reads only the columns a query needs and compresses them tightly, making it ideal for wide table scans even though it is poor for single row writes.