The Dimensionality of Embeddings

What the dimension means

The dimension of an embedding is how many numbers make up each vector, often between 64 and 1536. More dimensions give the model more room to encode subtle distinctions, but each extra dimension costs memory and compute.

The tradeoff

Too few dimensions and unrelated items get crowded together, losing detail. This is underfitting the representation.
Too many dimensions waste storage, slow search, and can capture noise.

The right size depends on data complexity and how many items you must distinguish.

The curse of dimensionality

In very high dimensions, distances between points become more uniform, so nearest neighbor contrasts can weaken. Good training counteracts this by concentrating useful structure on a lower dimensional manifold inside the space.

Cost at scale

Storage and search both scale with dimension. A billion vectors at 1536 dimensions is far heavier than at 384. Techniques like reducing dimension, quantizing, or using Matryoshka style truncation help control this cost.

Key idea

Embedding dimension trades representational capacity against memory and search cost, and the best choice balances enough room to separate items with the expense of storing and searching many numbers.

The Dimensionality of Embeddings

What the dimension means

The tradeoff

The curse of dimensionality

Cost at scale

Key idea

Check yourself