← Lessons

quiz vs the machine

Silver1120

Machine Learning

The K Nearest Neighbors

Classify a point by asking its closest neighbors to vote.

4 min read · intro · beat Silver to climb

A lazy learner

K nearest neighbors stores the training data and does no real fitting. To predict, it finds the k closest stored points and lets them vote for classification or average for regression.

Choosing k

  • Small k follows the data closely but is sensitive to noise.
  • Large k smooths predictions but can blur real boundaries.
  • Odd k avoids ties in binary voting.

What it depends on

  • A distance metric to decide who is near.
  • Feature scaling, since a large scale feature can dominate the distance.
  • A reasonable number of features, because distances blur in very high dimensions.

Costs and tradeoffs

  • Training is instant; prediction is slow since it scans the data.
  • Memory grows with the dataset.
  • Spatial indexes like trees speed up neighbor search in low dimensions.

Key idea

K nearest neighbors predicts from the closest stored examples. It needs a distance metric and scaled features, and shifts all the cost from training to prediction time.

Check yourself

Answer to earn rating on the learn ladder.

1. What happens at training time for k nearest neighbors?

2. Why is feature scaling important for KNN?