The Multiclass Strategies One Vs Rest

The challenge

Many algorithms are natively binary, separating just two classes. To handle three or more labels, we wrap them in a multiclass strategy.

One vs rest

In one vs rest, you train one binary classifier per class. Each one learns to separate its class from all the others combined.

To predict, run every classifier and pick the class whose score is highest.
You need only as many models as there are classes.
Class imbalance can creep in since the rest side is usually larger.

One vs one

The alternative, one vs one, trains a classifier for every pair of classes and votes.

It needs many more models, scaling with the square of the class count.
Each model trains on a smaller, balanced slice of data.

Choosing

One vs rest is cheaper and the common default.
One vs one can help when pairwise boundaries are easier to learn.
Some models like softmax handle all classes jointly without either wrapper.

Key idea