Addressing by Content Not Location
In ordinary storage you pick a name like a path and the bytes live wherever you put them. Content addressed storage, or CAS, flips this: the address of a blob is the cryptographic hash of its own bytes. The data names itself.
What This Buys You
- Deduplication is automatic. Two identical files hash to the same address, so they are stored once no matter how many times they are uploaded.
- Integrity is built in. Re hashing the bytes and comparing to the address detects any corruption or tampering.
- Immutability follows naturally. Changing the content changes the hash, so an address always points to exactly one version of the bytes.
Where It Shows Up
Git stores objects by the hash of their content. Docker layers, IPFS, and many backup systems work the same way. The cost is that you cannot edit in place: a new version is a new address, and you need a separate mutable layer of references to track what the current version is.
Key idea
Content addressed storage names a blob by the hash of its bytes, giving automatic deduplication, built in integrity checks, and immutability at the cost of in place edits.