← Lessons

quiz vs the machine

Gold1460

System Design

The Content Addressed Storage Idea

Naming a blob by the hash of its bytes gives you dedup and tamper detection for free.

5 min read · core · beat Gold to climb

Addressing by Content Not Location

In ordinary storage you pick a name like a path and the bytes live wherever you put them. Content addressed storage, or CAS, flips this: the address of a blob is the cryptographic hash of its own bytes. The data names itself.

What This Buys You

  • Deduplication is automatic. Two identical files hash to the same address, so they are stored once no matter how many times they are uploaded.
  • Integrity is built in. Re hashing the bytes and comparing to the address detects any corruption or tampering.
  • Immutability follows naturally. Changing the content changes the hash, so an address always points to exactly one version of the bytes.

Where It Shows Up

Git stores objects by the hash of their content. Docker layers, IPFS, and many backup systems work the same way. The cost is that you cannot edit in place: a new version is a new address, and you need a separate mutable layer of references to track what the current version is.

Key idea

Content addressed storage names a blob by the hash of its bytes, giving automatic deduplication, built in integrity checks, and immutability at the cost of in place edits.

Check yourself

Answer to earn rating on the learn ladder.

1. How is a blob addressed in content addressed storage?

2. Why does content addressing give automatic deduplication?

3. What is a cost of content addressed storage?