← Lessons

quiz vs the machine

Platinum1800

Machine Learning

The Contextual Bandit

Bandits that read the situation before choosing what to show.

6 min read · advanced · beat Platinum to climb

Adding context

A plain bandit treats every decision the same, but in recommendations the right choice depends on who is asking and when. A contextual bandit observes a context vector, such as user features and time, before picking an arm, and learns a policy mapping context to the best action.

How it learns

  • At each step it sees a context, chooses an action, and observes a reward only for that action.
  • It fits a model predicting reward from context and action, then acts to balance estimated value with uncertainty.
  • Because it sees reward only for the chosen action, it must explore to avoid blind spots, the partial feedback problem.

A common algorithm

LinUCB assumes reward is linear in context features and maintains a confidence interval around its estimate. It picks the action with the highest upper confidence bound, exploring contexts and actions it is unsure about while exploiting strong matches.

Why it suits recommendations

  • Personalization is built in: the same item can be chosen for one user and skipped for another.
  • It adapts continuously, handling new items and drifting tastes.
  • It explores efficiently, focusing on uncertain context action pairs rather than blanket randomness.

Off policy care

Logged data comes from the old policy, so evaluating a new policy needs importance weighting or counterfactual estimators to correct for which actions were actually shown.

Key idea

A contextual bandit chooses actions based on a context vector, learning a personalized policy under partial feedback and exploring uncertain context action pairs efficiently.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a contextual bandit add over a plain bandit?

2. What is the partial feedback problem in contextual bandits?

3. Why is importance weighting needed when evaluating a new policy offline?