Surrogate Versus Natural Keys
A primary key can be natural or surrogate, and the choice shapes the whole schema.
A natural key is data that already identifies the row in the real world, such as an email address, a social security number, or a country code. A surrogate key is a value the system invents purely to identify the row, such as an auto incrementing integer or a generated identifier, with no business meaning.
Natural keys read well and avoid an extra column, but they have risks:
- They can change, and a changing primary key ripples through every foreign key.
- They may not be truly unique, or uniqueness can be revoked.
- They may be wide or sensitive, which bloats indexes and leaks data.
Surrogate keys avoid those problems because they are stable, compact, and meaningless. Nothing in the business will ever force a surrogate id to change. The cost is an extra column and the loss of human readability, plus the temptation to forget that the natural attribute still needs a unique constraint of its own.
A common pattern is to use a surrogate key as the primary key for stability and joins, while adding a separate unique constraint on the natural attribute to enforce real world uniqueness. You get stability from one and correctness from the other.
Key idea
Surrogate keys give stable meaningless identity ideal for joins, while natural attributes still deserve a unique constraint to enforce real world uniqueness.