The Naive Bayes Variants

Choosing Gaussian, multinomial, or Bernoulli Naive Bayes based on your feature type.

One assumption, several likelihoods

Naive Bayes applies Bayes rule with the strong assumption that features are conditionally independent given the class. The variants differ only in how they model each feature given the class.

The three common variants

Gaussian assumes each continuous feature follows a normal distribution per class. Use it for real valued measurements.
Multinomial models counts, such as word frequencies in a document. It is the classic choice for text classification.
Bernoulli models binary presence or absence features, useful when only whether a word appears matters, not how often.

Why it works despite naivety

The independence assumption is usually false, yet the classifier often ranks classes correctly.
Laplace smoothing adds a small count to every feature so unseen combinations do not force a zero probability.
Training is just counting, so it is extremely fast and works with little data.

Key idea

Naive Bayes assumes conditional independence and picks a likelihood per feature type, Gaussian for continuous, multinomial for counts, and Bernoulli for binary. Laplace smoothing avoids zero probabilities and training is fast.

The Naive Bayes Variants

One assumption, several likelihoods

The three common variants

Why it works despite naivety

Key idea

Check yourself