What it is
A loss function turns the gap between a model's prediction and the truth into a single number. Training works by minimizing this number, so the loss effectively defines what the model is trying to achieve.
Common choices
Different tasks use different losses:
- Mean squared error for regression, which penalizes large errors heavily by squaring them
- Mean absolute error for regression when outliers should matter less
- Cross entropy for classification, which rewards confident correct probabilities and harshly penalizes confident wrong ones
Why the choice matters
The loss shapes behavior. Squaring errors makes a model very sensitive to outliers. Cross entropy pushes predicted probabilities toward zero or one for the correct class. Picking a loss that matches your real goal is one of the most important modeling decisions.
Surrogate losses
We often optimize a smooth surrogate loss because the metric we truly care about, like accuracy, is not differentiable. Cross entropy is a smooth stand in for classification accuracy.
Key idea
The loss function encodes the objective, and choosing one that reflects your real goal steers the entire training process.