UMAP for Visualization

UMAP, uniform manifold approximation and projection, is a nonlinear dimensionality reduction method often used in place of t SNE. It tends to be faster and to preserve more global structure.

How it works

UMAP builds a graph of nearest neighbors in the high dimensional space, modeling the data as a fuzzy connected manifold. It then optimizes a low dimensional layout whose neighbor graph is as similar as possible to the original.

A fuzzy graph captures both close and slightly weaker connections.
The layout is found by an attraction and repulsion optimization, similar in spirit to a force directed graph.

Key hyperparameters

Number of neighbors controls the balance between local detail and global shape. Small values emphasize fine structure, large values reveal the big picture.
Minimum distance sets how tightly points may pack, affecting how clumped the clusters appear.

Compared to t SNE

UMAP usually runs faster on large datasets and keeps the relative arrangement of clusters more faithfully. Still, like t SNE, the absolute distances in a UMAP plot remain approximate, so it is a tool for exploration rather than precise measurement.

Key idea

UMAP embeds data by matching a fuzzy neighbor graph, running faster than t SNE and preserving more global structure while distances stay approximate.

UMAP for Visualization