What a loss does
A loss function maps predictions and targets to a single number measuring how wrong the model is. Training minimizes it, so the loss defines the goal.
Common choices
- Mean squared error: for regression, penalizes squared differences and is sensitive to outliers.
- Mean absolute error: for regression, more robust to outliers but harder to optimize at zero.
- Cross entropy: for classification, penalizes confident wrong predictions heavily.
- Hinge loss: for margin based classifiers like SVMs.
Matching loss to task
The loss encodes assumptions. Squared error assumes Gaussian style noise. Cross entropy assumes a probabilistic class model and pairs with a softmax or sigmoid output.
- Choose a loss that reflects what mistakes cost.
- The output activation should match the loss.
A well chosen loss makes the gradient point toward genuinely better models, which is half the battle in training.
Key idea
The loss function defines what the model optimizes, so matching it to the task and output activation is what makes gradients point toward genuinely better predictions.