The Distance Metrics

Why the metric matters

Any nearness based method, from neighbors to clustering, needs a rule for distance. The choice of metric reshapes which points look close, so it directly changes predictions.

Common metrics

Euclidean distance is straight line distance, the everyday default.
Manhattan distance sums absolute differences, like city blocks, and resists extreme single dimensions.
Cosine similarity compares the angle between vectors, ignoring magnitude, useful for text.
Hamming distance counts mismatching positions for categorical or binary data.

Scaling and the curse

Without scaling, features in large units dominate Euclidean distance.
In very high dimensions, distances between points grow similar, the curse of dimensionality, weakening nearness.

Picking one

Use cosine when direction matters more than length, as with documents.
Use Manhattan when you want robustness to a few large coordinate gaps.
Match the metric to the data type and the meaning of similarity.

Key idea

Distance metrics define what nearness means. Euclidean, Manhattan, cosine, and Hamming each suit different data, and scaling plus dimensionality strongly affect them.

The Distance Metrics

Why the metric matters

Common metrics

Scaling and the curse

Picking one

Key idea

Check yourself