Address equals content
In a content addressed store the key of a blob is the hash of its bytes. The name is not chosen; it is derived. Give the store the same bytes and you always get the same key. This single rule yields several properties at once.
Properties that fall out
- Automatic deduplication: identical content produces an identical key, so it is stored once.
- Integrity for free: a reader rehashes the bytes and checks they match the key, detecting any corruption.
- Immutability: changing the bytes changes the key, so an address can never silently point at different data.
- Cacheability: because a key never changes meaning, it can be cached forever.
The tradeoff
You lose mutable names. To track an evolving thing you add a separate, mutable pointer, a name to hash mapping, that you update to point at new content. This is exactly how versioned manifests and many distributed file systems work.
Key idea
A content addressed store names blobs by the hash of their bytes, giving deduplication, integrity, and immutability for free at the cost of mutable names.