← Lessons

quiz vs the machine

Gold1410

System Design

The Data Lake and Warehouse

Contrast schema on read raw lakes with schema on write structured warehouses.

5 min read · core · beat Gold to climb

Two storage philosophies

  • A data warehouse stores cleaned structured data in a fixed schema, optimized for fast analytical SQL. You define the schema before loading, so this is schema on write.
  • A data lake stores raw data of any shape in cheap object storage. You impose structure only when you read, so this is schema on read.

Tradeoffs

  • Warehouse: fast queries and strong governance, but ingestion is rigid and storing everything is expensive.
  • Lake: cheap, flexible, and keeps raw fidelity for unknown future uses, but querying is slower and quality can rot into a data swamp without governance.

The lakehouse

A lakehouse combines both. It keeps data in open columnar files on object storage but adds a table layer that brings transactions, schema enforcement, and warehouse like query speed over the lake.

When to use which

  • Use a warehouse for trusted business reporting on well defined metrics.
  • Use a lake for raw logs, machine learning features, and exploratory data whose schema is not yet fixed.

Key idea

A warehouse enforces schema on write for fast governed analytics while a lake stores raw data for schema on read flexibility, and the lakehouse pattern blends the two on cheap object storage.

Check yourself

Answer to earn rating on the learn ladder.

1. What does schema on read mean?

2. What does a lakehouse add to a data lake?