A switch that protects callers
A circuit breaker stops calls to a failing dependency so the caller fails fast instead of piling up on a dead service. It has three states: closed passing traffic, open rejecting fast, and half open testing recovery.
The tuning knobs
- A failure threshold decides how many failures over a window trip the breaker. Too low and it flaps on noise; too high and it trips too late.
- An open duration sets how long it stays open before testing. Too short and it slams the recovering service; too long and it stays down after recovery.
- A half open trial sends a few probe requests; success closes the breaker, failure reopens it.
Measuring on rate not count
Tune on a failure rate over a window rather than a raw count, so a busy service and a quiet one trip on comparable conditions. Require a minimum request volume before judging, or a couple of early failures will trip a barely used breaker.
Key idea
Tune a breaker on failure rate over a window with a minimum volume, and pick an open duration that gives the dependency room to recover without flapping.