← Lessons

quiz vs the machine

Gold1360

Machine Learning

The Distance Metrics

Different rulers for nearness change who counts as a neighbor.

4 min read · core · beat Gold to climb

Why the metric matters

Any nearness based method, from neighbors to clustering, needs a rule for distance. The choice of metric reshapes which points look close, so it directly changes predictions.

Common metrics

  • Euclidean distance is straight line distance, the everyday default.
  • Manhattan distance sums absolute differences, like city blocks, and resists extreme single dimensions.
  • Cosine similarity compares the angle between vectors, ignoring magnitude, useful for text.
  • Hamming distance counts mismatching positions for categorical or binary data.

Scaling and the curse

  • Without scaling, features in large units dominate Euclidean distance.
  • In very high dimensions, distances between points grow similar, the curse of dimensionality, weakening nearness.

Picking one

  • Use cosine when direction matters more than length, as with documents.
  • Use Manhattan when you want robustness to a few large coordinate gaps.
  • Match the metric to the data type and the meaning of similarity.

Key idea

Distance metrics define what nearness means. Euclidean, Manhattan, cosine, and Hamming each suit different data, and scaling plus dimensionality strongly affect them.

Check yourself

Answer to earn rating on the learn ladder.

1. Which metric ignores vector magnitude and compares direction?

2. What is the curse of dimensionality for distances?

3. Which metric fits categorical or binary strings best?