Denormalization For Reads
Denormalization deliberately reintroduces redundancy that normalization removed, in exchange for faster reads. Fully normalized schemas can force many joins to assemble one view, which becomes expensive under heavy read traffic.
Common denormalization techniques:
- Duplicated columns copy a parent value onto the child, such as storing the author name on each post to skip a join.
- Precomputed aggregates store a running count or sum, such as a comment count on a post, so you do not aggregate on every read.
- Materialized views persist the result of an expensive query and refresh it periodically.
The cost is that each duplicated fact now lives in more than one place. When the source changes you must update every copy, or the copies drift and become inconsistent. This shifts complexity from read time to write time, and someone must own keeping copies in sync, often a trigger, an application hook, or a scheduled job.
The decision is a tradeoff. Denormalize when reads vastly outnumber writes, when the joins are measurably slow, and when slightly stale data is acceptable. Keep the normalized tables as the source of truth and treat denormalized copies as a derived cache. That way you can always rebuild the redundant data from the authoritative rows.
Key idea
Denormalization trades extra write complexity and consistency risk for faster reads, so apply it only where reads dominate and keep a normalized source of truth.