The cardinality problem
Metrics get powerful when you attach labels, such as a status code or a region, that let you slice the data. But labels carry a hidden danger called cardinality.
What cardinality means
The cardinality of a metric is the number of unique combinations of its label values. Each unique combination becomes its own time series that the system must store and update separately. A metric with a region label of five values and a status label of ten values has fifty series, which is fine.
How it explodes
The trouble starts with labels that have unbounded values. Putting a user id, a request id, or a raw url into a label means a new series for every distinct value. Millions of users become millions of series. Memory and query cost grow with cardinality, and the metrics backend can fall over.
Staying safe
- Use labels only for values from a small fixed set
- Push high cardinality detail into logs or traces instead
- Bucket continuous values, for example status class instead of exact code
- Set limits so a runaway label cannot take down the system
Key idea
Cardinality is the count of unique label combinations, so keep labels low and bounded and push unique identifiers into logs or traces.