← Lessons

quiz vs the machine

Silver1100

Machine Learning

The Embedding Visualization

Projecting high dimensional vectors down to two dimensions you can actually see.

5 min read · intro · beat Silver to climb

Why visualize

Embeddings live in hundreds of dimensions, which humans cannot picture. Dimensionality reduction projects them to two or three dimensions so you can inspect clusters, spot outliers, and sanity check that similar items group together.

Common methods

  • PCA finds the directions of greatest variance and is fast and linear, good for a first look.
  • t SNE preserves local neighborhoods, revealing tight clusters but distorting global distances.
  • UMAP is faster than t SNE and tends to keep more of the global structure.

Reading the plots carefully

These plots are interpretive aids, not exact maps. In t SNE and UMAP the distance between clusters and the cluster sizes are often not meaningful, and results shift with settings like perplexity or neighbor count. Treat them as qualitative.

What to look for

  • Clean separation of known categories suggests a healthy embedding.
  • Mixed or smeared clusters can hint at a weak model or noisy labels.
  • Strange isolated points may flag data problems.

Key idea

Visualizing embeddings means projecting them to two dimensions with PCA, t SNE, or UMAP to inspect cluster structure, but the plots are qualitative since inter cluster distances can be distorted.

Check yourself

Answer to earn rating on the learn ladder.

1. Why must we reduce dimensions before plotting embeddings?

2. What is a caution when reading a t SNE plot?