The architecture
A cell is a complete, self contained copy of the stack: load balancer, services, and data for a slice of users. The system is built from many cells, and each user is assigned to one cell.
Why cells limit blast radius
A bad deploy, a poison message, or an overload usually stays inside the cell that triggered it.
- A failing cell affects only its users, not the whole fleet.
- You can roll out changes cell by cell and stop if one breaks.
- Capacity grows by adding cells, each a known, tested unit.
What it demands
- A router maps each user or tenant to a cell and must itself stay highly available.
- Cells should share nothing stateful, or the isolation leaks.
- Cross cell operations are awkward and should be rare.
Cells trade some efficiency and operational complexity for strong fault containment.
Key idea
Build the system from independent cells so a failure is trapped inside one cell.