Binning and Discretization
Binning, or discretization, converts a continuous feature into a small set of ordered buckets. Age might become child, adult, and senior groups instead of an exact number.
Common binning schemes
- Equal width splits the value range into intervals of the same size.
- Equal frequency chooses bin edges so each bucket holds roughly the same count of rows.
- Supervised binning places edges where the target behavior changes, optimizing predictive power.
Why bin at all
- It can capture nonlinear effects for models that only fit linear relationships.
- It reduces the influence of small fluctuations and noise.
- It produces interpretable groups that stakeholders understand.
The cost is lost resolution. Collapsing fine detail into a few buckets discards information, and bad edge choices can hide real structure. Equal width bins are also sensitive to outliers that stretch the range, leaving most data in one crowded bucket.
Key idea
Binning groups continuous values into discrete buckets to capture nonlinearity and reduce noise, trading resolution for robustness and interpretability.