← Lessons

quiz vs the machine

Platinum1800

System Design

The Content Addressed Store Revisited

Name each blob by the hash of its bytes so the key proves the content and dedup comes free.

5 min read · advanced · beat Platinum to climb

Address equals content

In a content addressed store the key of a blob is the hash of its bytes. The name is not chosen; it is derived. Give the store the same bytes and you always get the same key. This single rule yields several properties at once.

Properties that fall out

  • Automatic deduplication: identical content produces an identical key, so it is stored once.
  • Integrity for free: a reader rehashes the bytes and checks they match the key, detecting any corruption.
  • Immutability: changing the bytes changes the key, so an address can never silently point at different data.
  • Cacheability: because a key never changes meaning, it can be cached forever.

The tradeoff

You lose mutable names. To track an evolving thing you add a separate, mutable pointer, a name to hash mapping, that you update to point at new content. This is exactly how versioned manifests and many distributed file systems work.

Key idea

A content addressed store names blobs by the hash of their bytes, giving deduplication, integrity, and immutability for free at the cost of mutable names.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the key of a blob in a content addressed store?

2. What does a content addressed store give up?