The CatBoost Specifics

Ordered boosting and ordered target statistics that tame categorical features and target leakage.

Boosting built around categories

CatBoost is gradient boosting designed to handle categorical features well and to avoid a subtle leakage that hurts other boosters. Its two signature ideas are ordered target statistics and ordered boosting.

The target leakage problem

A common trick replaces a category with the mean target for that category, called target encoding. Done naively, each row uses its own label, leaking the target and inflating training accuracy. CatBoost computes these statistics using only prior rows in a random permutation, so no row sees its own label.

Ordered boosting

The same leakage can occur in computing residuals. Ordered boosting maintains models trained on prefixes of a permutation, so each row gets a residual from a model that never saw it. This reduces a prediction shift bias that standard boosting suffers.

Other traits

It builds symmetric, also called oblivious, trees where every node at a level uses the same split, making inference very fast.
It encodes feature combinations of categories automatically.
Strong defaults mean it often works well with little tuning.

Key idea

CatBoost prevents target leakage with ordered target statistics computed from prior rows and with ordered boosting for residuals, while symmetric trees and native categorical handling give fast, robust models.

The CatBoost Specifics

Boosting built around categories

The target leakage problem

Ordered boosting

Other traits

Key idea

Check yourself