A lazy learner
K nearest neighbors stores the training data and does no real fitting. To predict, it finds the k closest stored points and lets them vote for classification or average for regression.
Choosing k
- Small k follows the data closely but is sensitive to noise.
- Large k smooths predictions but can blur real boundaries.
- Odd k avoids ties in binary voting.
What it depends on
- A distance metric to decide who is near.
- Feature scaling, since a large scale feature can dominate the distance.
- A reasonable number of features, because distances blur in very high dimensions.
Costs and tradeoffs
- Training is instant; prediction is slow since it scans the data.
- Memory grows with the dataset.
- Spatial indexes like trees speed up neighbor search in low dimensions.
Key idea
K nearest neighbors predicts from the closest stored examples. It needs a distance metric and scaled features, and shifts all the cost from training to prediction time.