← Lessons

quiz vs the machine

Silver1100

Machine Learning

The Exploration Exploitation Tradeoff

Balancing trying new actions against using what you know.

4 min read · intro · beat Silver to climb

The Exploration Exploitation Tradeoff

Every learning agent faces a tension. Should it exploit the action it currently believes is best, or explore other actions that might turn out better? This is the exploration exploitation tradeoff.

Why both matter

  • Pure exploitation locks in whatever looks good early, which may be a mistake if estimates are wrong.
  • Pure exploration wastes reward by ignoring what has already been learned.

Good learning needs enough exploration to discover the truth and enough exploitation to cash in on it.

The cost of getting it wrong

If an agent never explores, it can get stuck on a mediocre action because it never sees that something better exists. If it explores forever, it never settles and keeps paying the price of suboptimal choices.

Managing the balance

A common pattern is to explore a lot early when estimates are uncertain, then gradually shift toward exploitation as confidence grows. The amount of exploration is usually controlled by a parameter that decays over time.

This decay matters for convergence. Many algorithms only reach the optimal policy if every action is tried infinitely often yet exploration eventually fades.

Key idea

Learning requires balancing exploration to discover better actions against exploitation to use current knowledge, usually exploring more early and exploiting more as confidence grows.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the risk of pure exploitation?

2. How is the tradeoff usually managed over time?