The idea
Target encoding replaces each category with a number derived from the target, usually the average target value for rows in that category. A city becomes its average sale price, a product becomes its average click rate.
Why use it
- It handles high cardinality categories that would explode one hot encoding into thousands of columns.
- It packs predictive signal into a single compact column.
- It works smoothly with tree and linear models alike.
The overfitting trap
The danger is that the encoding leaks the target. If a category appears once, its encoding is just that row answer, which the model memorizes.
- Use smoothing that pulls rare category averages toward the global mean.
- Compute encodings inside cross validation folds, so a row never sees its own target.
- This out of fold scheme is the standard safe recipe.
Practical notes
- Add a tiny amount of noise to further reduce leakage.
- Handle unseen categories at prediction time by falling back to the global mean.
Key idea
Target encoding maps a category to its average target, which is compact and powerful but must be smoothed and computed out of fold to avoid leaking the answer.