The objective
Ordinary least squares chooses the weights that minimize the sum of squared residuals, the gaps between predictions and true values. Squaring punishes large misses more than small ones and keeps the math smooth.
The closed form
For a feature matrix X and targets y the optimal weights solve a single linear equation. The normal equations give weights equal to the inverse of X transpose X times X transpose y. No iteration is needed when this inverse exists.
When it struggles
- If features are nearly collinear, X transpose X is close to singular and the solution explodes.
- The inverse costs roughly cubic time in the number of features, slow when features are many.
- Squared error is sensitive to outliers, which pull the line toward themselves.
Practical notes
- Center and scale features so the bias and weights behave well.
- Prefer a numerical solver over literally inverting a matrix for stability.
- Regularization rescues the collinear case by adding a small term to the diagonal.
Key idea
Ordinary least squares minimizes squared residuals and has a closed form solution via the normal equations. Collinearity and outliers are its main weaknesses.