A layered refinement model
The medallion architecture organizes a lakehouse into three quality tiers that data flows through, getting cleaner at each step.
The three layers
- Bronze holds raw ingested data, kept as close to the source as possible. It is append only and serves as an auditable history.
- Silver holds cleaned and conformed data: deduplicated, type cast, joined, and validated. It is the trustworthy working layer for engineers.
- Gold holds business level aggregates and curated tables shaped for specific dashboards, reports, or machine learning features.
Why layer this way
- Reprocessing is safe because bronze always retains the raw truth, so you can rebuild silver and gold anytime.
- Separation of concerns lets ingestion, cleaning, and business logic evolve independently.
- Trust increases as you move up, so consumers know gold tables are ready to use.
Practical notes
Each layer is usually a set of tables in the same storage, promoted by scheduled transforms. Avoid letting consumers read bronze directly, since raw data is messy and changes shape.
Key idea
Medallion layers refine data step by step from raw bronze to clean silver to business ready gold.