Loss Functions

What it is

A loss function turns the gap between a model's prediction and the truth into a single number. Training works by minimizing this number, so the loss effectively defines what the model is trying to achieve.

Common choices

Different tasks use different losses:

Mean squared error for regression, which penalizes large errors heavily by squaring them
Mean absolute error for regression when outliers should matter less
Cross entropy for classification, which rewards confident correct probabilities and harshly penalizes confident wrong ones

Why the choice matters

The loss shapes behavior. Squaring errors makes a model very sensitive to outliers. Cross entropy pushes predicted probabilities toward zero or one for the correct class. Picking a loss that matches your real goal is one of the most important modeling decisions.

Surrogate losses

We often optimize a smooth surrogate loss because the metric we truly care about, like accuracy, is not differentiable. Cross entropy is a smooth stand in for classification accuracy.

Key idea

The loss function encodes the objective, and choosing one that reflects your real goal steers the entire training process.

What it is

Common choices

Why the choice matters

Surrogate losses

Key idea

Check yourself