Join Algorithms Nested Loop Hash Merge
A join combines rows from two inputs that satisfy a condition. Engines implement this with three classic algorithms, each best in different situations.
Nested loop join
A nested loop join scans the outer input and, for each row, probes the inner input for matches. It is simple and works for any condition, but cost grows with the product of sizes unless the inner side has an index.
Hash join
A hash join builds a hash table on the smaller input keyed by the join column, then probes it with rows from the larger input. It shines for equality joins on large unsorted inputs.
Merge join
A merge join requires both inputs sorted on the join key, then walks them together like a zipper. It is excellent when inputs are already sorted or indexed in order.
- Nested loop is flexible and good for tiny or indexed inputs.
- Hash suits large equality joins.
- Merge suits sorted inputs.
Key idea
Nested loop, hash, and merge joins trade flexibility, equality speed, and sorted order, and the optimizer picks among them by cost.