← Lessons

quiz vs the machine

Silver1100

System Design

Data Warehouse vs Data Lake

Comparing structured query stores against cheap raw storage for any data shape.

4 min read · intro · beat Silver to climb

Two storage philosophies

A data warehouse stores cleaned, structured tables optimized for fast SQL analytics. A data lake stores raw files of any shape at low cost.

Data warehouse

  • Holds schema on write data, meaning structure is enforced when you load it.
  • Tuned for business intelligence, dashboards, and complex joins.
  • More expensive per byte but very fast for analytical queries.

Data lake

  • Holds schema on read data, so structure is applied only when you query.
  • Accepts logs, images, JSON, and CSV side by side in object storage.
  • Cheap and flexible, but raw data can become a hard to use data swamp without governance.

How they fit together

Many pipelines land raw data in a lake first, then refine and load curated tables into a warehouse for analysts. The lake is the durable source of truth, while the warehouse is the polished serving layer.

Key idea

A lake stores cheap raw data of any shape, while a warehouse stores structured data tuned for fast analytics.

Check yourself

Answer to earn rating on the learn ladder.

1. What defines schema on read?

2. Why can a data lake become a data swamp?