← Lessons

quiz vs the machine

Silver1100

System Design

The Columnar Analytics Engine

Storing data by column so analytic scans read only the fields they need.

4 min read · intro · beat Silver to climb

Rows versus columns

A row store keeps all fields of a record together, which is great when you read whole records. A columnar store keeps each column contiguous, which is great when an analytic query touches only a few columns over millions of rows.

Why columns win for analytics

  • A query like average price scans one column, so the engine skips every other field on disk.
  • Values in a column are similar, so compression like run length and dictionary encoding shrinks them hard.
  • Compact columns mean more data fits in cache and vectorized CPU loops process batches at once.

The trade off

Inserting or updating a single record now touches many separate column files, so columnar engines favor bulk loads and append heavy workloads over frequent small writes.

This is why warehouses and analytic engines like the columnar layout while transactional databases stay row oriented.

Key idea

Columnar storage reads only the needed columns and compresses them well, making wide table analytic scans fast.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is columnar storage fast for analytics?

2. What workload does a columnar engine handle worst?