← Lessons

quiz vs the machine

Gold1400

Machine Learning

The Dimensionality of Embeddings

Choosing how many numbers represent each item, and the tradeoffs involved.

5 min read · core · beat Gold to climb

What the dimension means

The dimension of an embedding is how many numbers make up each vector, often between 64 and 1536. More dimensions give the model more room to encode subtle distinctions, but each extra dimension costs memory and compute.

The tradeoff

  • Too few dimensions and unrelated items get crowded together, losing detail. This is underfitting the representation.
  • Too many dimensions waste storage, slow search, and can capture noise.

The right size depends on data complexity and how many items you must distinguish.

The curse of dimensionality

In very high dimensions, distances between points become more uniform, so nearest neighbor contrasts can weaken. Good training counteracts this by concentrating useful structure on a lower dimensional manifold inside the space.

Cost at scale

Storage and search both scale with dimension. A billion vectors at 1536 dimensions is far heavier than at 384. Techniques like reducing dimension, quantizing, or using Matryoshka style truncation help control this cost.

Key idea

Embedding dimension trades representational capacity against memory and search cost, and the best choice balances enough room to separate items with the expense of storing and searching many numbers.

Check yourself

Answer to earn rating on the learn ladder.

1. What is a downside of using too few embedding dimensions?

2. Why do very high dimensions also pose problems?