What it is
K nearest neighbors makes a prediction by looking at the k closest training points to the query and letting them vote. For classification it takes the majority label. For regression it averages their values.
Lazy learning
KNN does almost no work at training time. It simply stores the data. All the effort happens at prediction time when it searches for neighbors. This is why it is called a lazy learner.
Choosing k
The value of k controls smoothness.
- A small k, like one, follows the data closely and is sensitive to noise.
- A large k smooths the boundary but can blur real structure.
- Odd values avoid ties in two class problems.
Practical concerns
- Scale your features first, since distance is dominated by large valued features.
- Prediction is slow on big datasets because each query scans many points.
- It degrades badly when there are many features, a symptom of the curse of dimensionality.
Key idea
K nearest neighbors stores all data and predicts by voting among the closest points, with k trading off noise sensitivity against smoothness.