← Lessons

quiz vs the machine

Gold1360

System Design

Columnar Processing

Storing data by column to read only needed fields and compress aggressively for analytics.

5 min read · core · beat Gold to climb

Rows versus columns

A row store keeps all fields of a record together, which suits transactional reads of whole rows. A column store keeps each field's values together across all rows. Analytics usually scans a few columns over many rows, so columnar layout reads far less data.

Why columnar wins for analytics

  • Selective reads: a query touching three of fifty columns reads only those three, skipping the rest entirely.
  • Better compression: a column holds values of one type with low variety, so encodings like dictionary and run length shrink it dramatically.
  • Vectorized execution: processing a column as a batch lets the CPU work on many values per instruction.

The trade off

Columnar formats are slow for writing or updating single full rows, since one record is scattered across many column blocks. They favor append and scan workloads over point updates.

Key idea

Columnar processing stores fields together so analytic queries read only needed columns and compress them tightly, trading single row update speed for fast wide scans.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does columnar layout help analytic queries?

2. Why does a column compress well?

3. What workload does columnar favor?