Why four signals
Modern systems emit thousands of metrics, and staring at all of them hides the truth. The four golden signals are a deliberately small set that captures user facing health for almost any request driven service. If these four are good, the service is almost certainly serving users well.
The signals
- Latency is how long a request takes. Track success latency and failure latency separately, because a fast error is not the same as a fast success.
- Traffic is how much demand the system sees, such as requests per second.
- Errors is the rate of failed requests, including explicit failures and wrong answers that still return success codes.
- Saturation is how full the most constrained resource is, such as CPU, memory, or a queue. Saturation predicts problems before they become outages.
How they work together
A spike in traffic that drives up saturation often shows as rising latency and then climbing errors. Watching all four lets you see the chain of cause and effect rather than just the final symptom.
Key idea
Latency, traffic, errors, and saturation are the four signals that capture user facing health with almost no noise.