A relative score for regression
Raw error depends on the scale of the target, so it is hard to judge alone. R squared, the coefficient of determination, gives a scale free score by comparing your model to a trivial baseline that always predicts the mean.
The comparison
R squared equals one minus the ratio of your model squared error to the squared error of the mean baseline.
- An R squared of one means the model explains all the variance perfectly.
- An R squared of zero means it does no better than always guessing the mean.
- A negative R squared means the model is worse than the mean baseline.
Reading it carefully
- R squared measures explained variance, not whether predictions are unbiased.
- Adding more features can inflate R squared even when they are useless, so adjusted R squared penalizes extra features.
- A high R squared on training data can collapse on new data, so always check it on a held out set.
Key idea
R squared compares your model against simply predicting the mean. One is perfect, zero matches the baseline, and a negative value means your model is worse than guessing the average.