Tracking who belongs together
A disjoint set, also called union find, maintains a collection of elements partitioned into non overlapping groups. It answers one question fast: are two elements in the same group, and it supports merging groups. The disjoint set forest implements it as a forest of trees where each element points to a parent, and the root of each tree names the group.
Two core operations
- Find: follow parent pointers from an element up to its root, which identifies its group.
- Union: to merge two groups, attach the root of one tree under the root of the other.
Naively these can build tall thin trees and become slow, so two optimizations are essential.
The two optimizations
Union by rank always hangs the shorter tree under the taller one, keeping trees shallow. Path compression rewires every node visited during a find to point straight at the root, flattening the tree for future queries. Used together, the amortized cost per operation is almost constant, growing slower than any practical function of the input size. This structure is the engine behind cycle detection and Kruskal's minimum spanning tree algorithm.
Key idea
A disjoint set forest represents groups as parent linked trees, answering find and union by walking to roots, and with union by rank and path compression each operation runs in near constant amortized time.