Compartments that contain damage
A bulkhead isolates resources into separate pools so a failure in one cannot drain the others. The name comes from ship compartments that stop a single breach from flooding the whole hull.
The problem it solves
If all requests share one thread pool and one downstream dependency goes slow, every thread can end up parked waiting on that dependency. The healthy endpoints starve even though their own dependencies are fine. This is a cascading failure through a shared pool.
Sizing the compartments
- Give each dependency or endpoint its own bounded pool so a slow one can only exhaust its own slice.
- Size each pool from the dependency's expected concurrency, often using Little law on its own throughput and latency.
- Leave a small reserve so a partial failure does not consume the entire allocation at once.
The tradeoff is that strict partitioning can leave some pools idle while another is full. A bit of slack or a shared overflow tier balances isolation against efficiency.
Key idea
Bulkheads give each dependency its own bounded pool so one slow caller drains only its slice, trading some idle capacity for failure isolation.