Measuring direction
Cosine similarity is the cosine of the angle between two vectors. It equals their dot product divided by the product of their lengths. The result ranges from minus one for opposite directions, through zero for orthogonal, to one for the same direction.
Why it ignores magnitude
Because it divides out both vector lengths, cosine similarity cares only about direction, not magnitude. Two embeddings pointing the same way score one even if one is much longer. This is helpful when length reflects something irrelevant, like document size or word frequency, rather than meaning.
Relation to distance
On vectors of the same length, cosine similarity is directly tied to Euclidean distance: maximizing cosine similarity is the same as minimizing distance once vectors are normalized. That is why many embedding systems normalize first, then compare.
Practical notes
- It is bounded and easy to interpret as a score.
- It is robust when only the pattern of features matters.
- It can mislead if magnitude genuinely carries signal.
Key idea
Cosine similarity scores the angle between vectors, ignoring magnitude, which makes it a robust and interpretable choice for embeddings where direction carries the meaning.