← Lessons

quiz vs the machine

Silver1060

System Design

Warehouse vs Lake vs Lakehouse

Three ways to store analytical data, and why the lakehouse tries to blend the first two.

4 min read · intro · beat Silver to climb

Three storage philosophies

Analytical data has to live somewhere, and three patterns dominate. They differ in how structured the data must be before it lands.

  • A data warehouse stores cleaned, structured tables optimized for SQL queries. Schema is defined up front, called schema on write, which makes queries fast but ingestion rigid.
  • A data lake stores raw files of any shape in cheap object storage. Schema is applied only when you read, called schema on read, which is flexible but easy to turn into a swamp of junk.
  • A lakehouse keeps data in open file formats on a lake but adds a table layer with transactions, schema, and indexes on top. It aims for warehouse reliability at lake cost.

How they relate

The lakehouse is a response to teams running both a lake and a warehouse and paying to copy data between them. By adding a metadata layer over lake files, you query raw and curated data in one place.

Key idea

Warehouses enforce schema on write for speed, lakes defer schema for flexibility, and the lakehouse adds a table layer to get both at once.

Check yourself

Answer to earn rating on the learn ladder.

1. What does schema on read mean?

2. What problem does a lakehouse mainly try to solve?