The Cosine Similarity Deep Dive

Measuring direction

Cosine similarity is the cosine of the angle between two vectors. It equals their dot product divided by the product of their lengths. The result ranges from minus one for opposite directions, through zero for orthogonal, to one for the same direction.

Why it ignores magnitude

Because it divides out both vector lengths, cosine similarity cares only about direction, not magnitude. Two embeddings pointing the same way score one even if one is much longer. This is helpful when length reflects something irrelevant, like document size or word frequency, rather than meaning.

Relation to distance

On vectors of the same length, cosine similarity is directly tied to Euclidean distance: maximizing cosine similarity is the same as minimizing distance once vectors are normalized. That is why many embedding systems normalize first, then compare.

Practical notes

It is bounded and easy to interpret as a score.
It is robust when only the pattern of features matters.
It can mislead if magnitude genuinely carries signal.

Key idea

Cosine similarity scores the angle between vectors, ignoring magnitude, which makes it a robust and interpretable choice for embeddings where direction carries the meaning.

The Cosine Similarity Deep Dive

Measuring direction

Why it ignores magnitude

Relation to distance

Practical notes

Key idea

Check yourself