← Lessons

quiz vs the machine

Gold1400

Databases

Columnar Storage

Why analytics engines store data by column instead of by row.

5 min read · core · beat Gold to climb

Rows vs Columns

A traditional database stores a row's fields together on disk. A columnar store keeps all values of one column together instead. For analytics that read a few columns across millions of rows, this layout reads far less data.

Why It Is Fast For Analytics

  • A query that sums one column reads only that column, not whole rows.
  • Values in a column share a type, so compression works extremely well.
  • Repeated values enable tricks like run length encoding and dictionaries.
  • The engine can scan compressed blocks and skip irrelevant chunks.

The Trade Off

  • Reading a single full row means gathering values from many column files.
  • Inserting one row touches every column, so writes are slower.
  • This is why columnar stores power OLAP while row stores power OLTP.

Key idea

Columnar storage groups each column together, so analytical scans read and compress only the columns they need at the cost of slower single row reads and writes.

Check yourself

Answer to earn rating on the learn ladder.

1. Columnar storage is fastest for queries that:

2. Why does columnar data compress so well?