← Lessons

quiz vs the machine

Silver1050

Databases

Columnar Storage Benefits

Why analytics engines store data column by column instead of row by row.

4 min read · intro · beat Silver to climb

Rows vs Columns

A row store keeps all fields of a record together on disk. A column store keeps each column in its own contiguous region. Analytical queries usually read a few columns out of many, so reading only the needed columns avoids loading irrelevant bytes.

Why It Helps Analytics

  • Less IO: a query touching three of fifty columns reads only those three.
  • Better compression: values in one column share a type and range, so encodings like dictionary and run length shrink them far more than mixed row data.
  • Vectorized execution: tight arrays of one type let the CPU process many values per instruction.
  • Cache friendly: scanning a single column streams predictable memory.

The Tradeoff

Columnar layout is slow for single row writes and point lookups, because one record is spread across many files. That is why row stores still win for transactional workloads.

Key idea

Columnar storage reads only the columns a query needs and compresses them tightly, making it ideal for wide table scans even though it is poor for single row writes.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does columnar storage compress better than row storage?

2. Which workload suits a row store better than a column store?