← Lessons

quiz vs the machine

Gold1330

Machine Learning

The Naive Bayes Variants

Choosing Gaussian, multinomial, or Bernoulli Naive Bayes based on your feature type.

4 min read · core · beat Gold to climb

One assumption, several likelihoods

Naive Bayes applies Bayes rule with the strong assumption that features are conditionally independent given the class. The variants differ only in how they model each feature given the class.

The three common variants

  • Gaussian assumes each continuous feature follows a normal distribution per class. Use it for real valued measurements.
  • Multinomial models counts, such as word frequencies in a document. It is the classic choice for text classification.
  • Bernoulli models binary presence or absence features, useful when only whether a word appears matters, not how often.

Why it works despite naivety

  • The independence assumption is usually false, yet the classifier often ranks classes correctly.
  • Laplace smoothing adds a small count to every feature so unseen combinations do not force a zero probability.
  • Training is just counting, so it is extremely fast and works with little data.

Key idea

Naive Bayes assumes conditional independence and picks a likelihood per feature type, Gaussian for continuous, multinomial for counts, and Bernoulli for binary. Laplace smoothing avoids zero probabilities and training is fast.

Check yourself

Answer to earn rating on the learn ladder.

1. Which Naive Bayes variant fits word count features in text?

2. What does Laplace smoothing prevent?