The idea
LIME stands for local interpretable model agnostic explanations. It explains one prediction by approximating the complex model with a simple, interpretable model that is only accurate near that single point.
How it works
LIME builds a local explanation in a few steps.
- Take the instance to explain and create many perturbed versions nearby
- Ask the black box model for a prediction on each perturbed sample
- Weight each sample by how close it is to the original instance
- Fit a simple model, often linear, on these weighted samples
- Read the simple model's coefficients as the feature contributions
Why it is model agnostic
LIME only needs to call the model and read its outputs, so it works on any classifier, including ones it knows nothing about. That is the meaning of model agnostic.
Strengths and cautions
LIME is intuitive and works across data types like text and images. But explanations can be unstable, since they depend on the random perturbations and the chosen neighborhood size. Run it more than once and check that the explanation is consistent before trusting it.
Key idea
LIME fits a simple model on perturbed samples near one instance to approximate any black box locally.