The Bias Term
In a linear model the prediction is a weighted sum of the inputs plus an extra constant called the bias term. Though small, this single number is essential for flexibility.
Consider a line written as output equals weight times input plus bias. The weight controls the slope, how steeply the output rises with the input. The bias controls the intercept, the output value when all inputs are zero. Without a bias the line is forced through the origin, which rarely matches real data.
Why it matters:
- The bias lets the model shift its output up or down independently of the inputs
- It absorbs any constant offset in the target that the features cannot explain
- Without it, the model is unfairly constrained and often underfits
A common trick is to treat the bias as a weight on a constant input of one. This lets the same update machinery learn the bias just like any other weight, with no special case.
Do not confuse this bias term, a learnable parameter, with the statistical bias of overfitting and underfitting. They share a name but describe different ideas. One is a number in the equation; the other is a property of how rigid a model is.
Key idea
The bias term is a learnable offset that lets a model shift its output independently of the inputs, freeing it from passing through the origin.