Probability Distributions Overview
A probability distribution describes how likely each possible outcome of a random variable is. It is the foundation for reasoning about uncertainty in data and models.
Discrete versus continuous
- A discrete distribution assigns probabilities to countable outcomes, like a die roll. Its rule is a probability mass function.
- A continuous distribution spreads probability over a range, like a height. Its rule is a probability density function, and probabilities come from areas under the curve.
For any valid distribution the total probability across all outcomes equals one.
Summarizing a distribution
Two numbers describe most distributions at a glance.
- The expected value is the long run average outcome, the mean of the distribution.
- The variance captures how widely outcomes scatter around that mean.
Why models care
Machine learning models often assume a distribution for the data or the noise. Linear regression assumes Gaussian errors, and classifiers output a distribution over labels. Choosing the right family encodes prior beliefs about how outcomes behave.
Key idea
A probability distribution assigns likelihoods to outcomes, discrete via mass functions and continuous via density functions, summing or integrating to one.