← Lessons

quiz vs the machine

Platinum1780

Machine Learning

The Apriori Algorithm

Finding frequent itemsets by pruning with the downward closure property.

5 min read · advanced · beat Platinum to climb

The Apriori Algorithm

Apriori is the classic algorithm for finding frequent itemsets, the building blocks of association rules. Its power comes from one pruning principle that avoids checking an exponential number of item combinations.

The downward closure property

The central idea is the apriori property, also called downward closure.

  • If an itemset is frequent, then all of its subsets are also frequent.
  • Equivalently, if any subset is infrequent, the whole itemset cannot be frequent.

This lets the algorithm discard huge numbers of candidates before counting them.

The level wise search

Apriori scans the data in passes of growing itemset size.

  • Find all frequent single items.
  • Combine them into candidate pairs, count their support, keep the frequent ones.
  • Combine frequent pairs into candidate triples, and so on.

At each level, any candidate that has an infrequent subset is pruned immediately, so only promising candidates are counted.

Cost and alternatives

Apriori may scan the database once per level, which is its main cost on large data. Later methods such as FP growth compress the data into a tree to avoid repeated scans, but apriori remains the clearest illustration of the pruning idea.

Key idea

Apriori finds frequent itemsets level by level, using downward closure to prune any candidate with an infrequent subset before counting.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the apriori property state?

2. How does apriori use the property to save work?

3. What is the main cost of apriori on large datasets?