Dashboards and golden signals
A dashboard can drown you in graphs. The golden signals are a small, opinionated set of metrics that capture the health of a request driven service, so a dashboard built around them stays readable.
The four signals
- Latency, how long requests take, watched at high percentiles not just the average
- Traffic, how much demand the service handles, such as requests per second
- Errors, the rate of failed requests, including those that succeed but return wrong results
- Saturation, how full the system is on its most constrained resource, such as cpu or memory
These four answer the questions that matter most. Are requests slow, how busy are we, are we failing, and how close are we to the limit.
Designing the dashboard
Put the golden signals at the top so a glance tells you if users are hurting. Below them place supporting detail for diagnosis. Separate latency for successful requests from failed ones, since errors are often fast and would otherwise flatter your latency numbers.
Key idea
Latency, traffic, errors, and saturation summarize service health, so lead dashboards with these golden signals.