The Apriori Algorithm
Apriori is the classic algorithm for finding frequent itemsets, the building blocks of association rules. Its power comes from one pruning principle that avoids checking an exponential number of item combinations.
The downward closure property
The central idea is the apriori property, also called downward closure.
- If an itemset is frequent, then all of its subsets are also frequent.
- Equivalently, if any subset is infrequent, the whole itemset cannot be frequent.
This lets the algorithm discard huge numbers of candidates before counting them.
The level wise search
Apriori scans the data in passes of growing itemset size.
- Find all frequent single items.
- Combine them into candidate pairs, count their support, keep the frequent ones.
- Combine frequent pairs into candidate triples, and so on.
At each level, any candidate that has an infrequent subset is pruned immediately, so only promising candidates are counted.
Cost and alternatives
Apriori may scan the database once per level, which is its main cost on large data. Later methods such as FP growth compress the data into a tree to avoid repeated scans, but apriori remains the clearest illustration of the pruning idea.
Key idea
Apriori finds frequent itemsets level by level, using downward closure to prune any candidate with an infrequent subset before counting.