More than a number
A model can be accurate yet unusable if no one understands why it decides as it does. Interpretability is the degree to which a human can understand the reasons behind a model output. In high stakes settings it is not optional.
Why it matters
- Trust: people accept decisions they can understand and question.
- Debugging: you cannot fix a failure you cannot explain.
- Fairness auditing: spotting that a model relies on a biased proxy requires seeing inside it.
- Compliance: many domains legally require an explanation for automated decisions.
Two routes to it
- Intrinsic: use a simple model such as a small decision tree or linear model that is readable by design.
- Post hoc: apply explanation tools like feature attributions to a complex model after training.
The tradeoff
More complex models can be more accurate but harder to interpret. The right balance depends on the stakes of the decision.
Key idea
Interpretability lets humans understand why a model decides as it does, which underpins trust, debugging, fairness auditing, and compliance, and it can come intrinsically or through post hoc explanation tools.