Why a single cache is not enough
A cache stores hot data in memory so reads skip the slow database. One cache box runs out of memory and becomes a single point of failure. A distributed cache spreads keys across many nodes for capacity and resilience.
Placing keys on nodes
The challenge is deciding which node holds which key. Plain modulo hashing reshuffles everything when a node is added or removed.
Consistent hashing maps keys and nodes onto a ring. A key belongs to the next node clockwise. Adding or removing a node only moves the keys near it, not the whole keyspace, keeping the cache mostly warm.
Eviction
Cache memory is finite, so it must evict entries:
- Least recently used drops the entry untouched the longest.
- Time to live expires entries after a set age.
- These keep the cache holding the most useful data.
Consistency concerns
- A cached value can go stale when the source changes, so writes invalidate or update the cache.
- A cache miss falls through to the database, which then populates the cache.
Key idea
A distributed cache spreads keys across nodes using consistent hashing so membership changes move few keys, applies eviction like least recently used, and invalidates on writes to limit staleness.