Why Normalize
Normalization organizes columns and tables so each fact is stored once. Redundant data causes anomalies, where updating one copy and missing another leaves the database inconsistent.
The Common Forms
- First normal form requires each column to hold a single atomic value, with no repeating groups or lists in a cell.
- Second normal form removes partial dependencies, so every non key column depends on the whole primary key, not just part of it.
- Third normal form removes transitive dependencies, so non key columns depend only on the key and not on each other.
The Payoff and the Cost
- Less redundancy means a fact changes in one place.
- More tables can mean more joins to assemble a full record.
Denormalization
Sometimes you deliberately store redundant data to speed reads. This denormalization trades clean structure for fewer joins and must be kept in sync carefully.
Key idea
Normalization splits data so each fact lives once, avoiding update anomalies, while denormalization deliberately reintroduces redundancy to cut joins when reads dominate.