The Lakehouse Architecture

Combining cheap lake storage with warehouse style transactions and schema enforcement.

Why a lakehouse

A lakehouse blends the low cost and flexibility of a data lake with the reliability and performance of a data warehouse. The goal is one platform instead of copying data between two systems.

The key enabler

The trick is an open table format such as Delta Lake, Apache Iceberg, or Apache Hudi layered on top of object storage. These formats add a transaction log over plain files, which gives:

ACID transactions so concurrent writes do not corrupt tables.
Schema enforcement and evolution so columns stay consistent over time.
Time travel to query an older version of a table.

What you gain

Run SQL analytics and machine learning on the same copy of data.
Keep data in cheap open storage, avoiding lock in to one vendor.
Get warehouse style consistency without a separate warehouse.

Trade offs

A lakehouse engine can be slower than a tuned warehouse for some workloads, and the open formats add operational complexity around compaction and metadata.

Key idea