What normalization does
L2 normalization rescales a vector so its length becomes one, without changing its direction. You divide each component by the vector length. After this step every embedding sits on the surface of a unit sphere.
Why it helps
- It makes the dot product equal to cosine similarity, so fast inner product search returns angle based rankings.
- It removes magnitude differences that come from artifacts like text length, so comparisons depend on meaning.
- It keeps similarity scores in a bounded, comparable range across the whole index.
When to be careful
If the magnitude of an embedding genuinely carries information, normalizing throws that away. Some models are also trained expecting un normalized vectors at scoring time, so always match what the encoder was trained for.
A practical pipeline
A common flow is to encode text, L2 normalize the output, store it in a vector index, then query with a normalized vector using inner product. This gives cosine ranked nearest neighbors at the speed of a dot product.
Key idea
Normalizing embeddings to unit length puts them on a sphere where dot product equals cosine similarity, giving clean bounded comparisons, as long as magnitude was not carrying signal you needed.