← Lessons

quiz vs the machine

Gold1350

Machine Learning

The Loss Functions Overview

The loss defines what good means, and the right one depends on the task.

5 min read · core · beat Gold to climb

What a loss does

A loss function maps predictions and targets to a single number measuring how wrong the model is. Training minimizes it, so the loss defines the goal.

Common choices

  • Mean squared error: for regression, penalizes squared differences and is sensitive to outliers.
  • Mean absolute error: for regression, more robust to outliers but harder to optimize at zero.
  • Cross entropy: for classification, penalizes confident wrong predictions heavily.
  • Hinge loss: for margin based classifiers like SVMs.

Matching loss to task

The loss encodes assumptions. Squared error assumes Gaussian style noise. Cross entropy assumes a probabilistic class model and pairs with a softmax or sigmoid output.

  • Choose a loss that reflects what mistakes cost.
  • The output activation should match the loss.

A well chosen loss makes the gradient point toward genuinely better models, which is half the battle in training.

Key idea

The loss function defines what the model optimizes, so matching it to the task and output activation is what makes gradients point toward genuinely better predictions.

Check yourself

Answer to earn rating on the learn ladder.

1. Which loss is standard for multiclass classification?

2. Why is mean squared error sensitive to outliers?