A proportion of variance
R squared, the coefficient of determination, is the fraction of the variance in the target that the model explains, compared to just predicting the mean.
- 1.0 means perfect prediction
- 0 means no better than the mean
- Negative is possible on test data when the model is worse than the mean
The inflation problem
Adding any feature, even pure noise, can only keep R squared the same or raise it. So a bigger model always looks better on training R squared, which tempts overfitting.
The adjustment
Adjusted R squared penalizes extra predictors. It rises only if a new feature improves the fit more than chance would. It can fall when you add a useless variable.
- Use R squared to describe variance explained on a fixed model
- Use adjusted R squared to compare models with different numbers of features
A caution
A high R squared does not prove a good model. It says nothing about bias in residuals, causation, or generalization. Always pair it with a residual plot.
Key idea
R squared measures variance explained but never decreases when you add features. Adjusted R squared corrects for feature count and is the fairer comparison.