UMAP for Visualization
UMAP, uniform manifold approximation and projection, is a nonlinear dimensionality reduction method often used in place of t SNE. It tends to be faster and to preserve more global structure.
How it works
UMAP builds a graph of nearest neighbors in the high dimensional space, modeling the data as a fuzzy connected manifold. It then optimizes a low dimensional layout whose neighbor graph is as similar as possible to the original.
- A fuzzy graph captures both close and slightly weaker connections.
- The layout is found by an attraction and repulsion optimization, similar in spirit to a force directed graph.
Key hyperparameters
- Number of neighbors controls the balance between local detail and global shape. Small values emphasize fine structure, large values reveal the big picture.
- Minimum distance sets how tightly points may pack, affecting how clumped the clusters appear.
Compared to t SNE
UMAP usually runs faster on large datasets and keeps the relative arrangement of clusters more faithfully. Still, like t SNE, the absolute distances in a UMAP plot remain approximate, so it is a tool for exploration rather than precise measurement.
Key idea
UMAP embeds data by matching a fuzzy neighbor graph, running faster than t SNE and preserving more global structure while distances stay approximate.