← Lessons

quiz vs the machine

Gold1350

System Design

The Medallion Bronze Silver Gold

Layering data refinement into raw, cleaned, and business ready tiers.

5 min read · core · beat Gold to climb

A layered refinement model

The medallion architecture organizes a lakehouse into three quality tiers that data flows through, getting cleaner at each step.

The three layers

  • Bronze holds raw ingested data, kept as close to the source as possible. It is append only and serves as an auditable history.
  • Silver holds cleaned and conformed data: deduplicated, type cast, joined, and validated. It is the trustworthy working layer for engineers.
  • Gold holds business level aggregates and curated tables shaped for specific dashboards, reports, or machine learning features.

Why layer this way

  • Reprocessing is safe because bronze always retains the raw truth, so you can rebuild silver and gold anytime.
  • Separation of concerns lets ingestion, cleaning, and business logic evolve independently.
  • Trust increases as you move up, so consumers know gold tables are ready to use.

Practical notes

Each layer is usually a set of tables in the same storage, promoted by scheduled transforms. Avoid letting consumers read bronze directly, since raw data is messy and changes shape.

Key idea

Medallion layers refine data step by step from raw bronze to clean silver to business ready gold.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the bronze layer hold?

2. Why keep raw data in bronze even after cleaning?