What the dimension means
The dimension of an embedding is how many numbers make up each vector, often between 64 and 1536. More dimensions give the model more room to encode subtle distinctions, but each extra dimension costs memory and compute.
The tradeoff
- Too few dimensions and unrelated items get crowded together, losing detail. This is underfitting the representation.
- Too many dimensions waste storage, slow search, and can capture noise.
The right size depends on data complexity and how many items you must distinguish.
The curse of dimensionality
In very high dimensions, distances between points become more uniform, so nearest neighbor contrasts can weaken. Good training counteracts this by concentrating useful structure on a lower dimensional manifold inside the space.
Cost at scale
Storage and search both scale with dimension. A billion vectors at 1536 dimensions is far heavier than at 384. Techniques like reducing dimension, quantizing, or using Matryoshka style truncation help control this cost.
Key idea
Embedding dimension trades representational capacity against memory and search cost, and the best choice balances enough room to separate items with the expense of storing and searching many numbers.