The challenge
Many algorithms are natively binary, separating just two classes. To handle three or more labels, we wrap them in a multiclass strategy.
One vs rest
In one vs rest, you train one binary classifier per class. Each one learns to separate its class from all the others combined.
- To predict, run every classifier and pick the class whose score is highest.
- You need only as many models as there are classes.
- Class imbalance can creep in since the rest side is usually larger.
One vs one
The alternative, one vs one, trains a classifier for every pair of classes and votes.
- It needs many more models, scaling with the square of the class count.
- Each model trains on a smaller, balanced slice of data.
Choosing
- One vs rest is cheaper and the common default.
- One vs one can help when pairwise boundaries are easier to learn.
- Some models like softmax handle all classes jointly without either wrapper.
Key idea
One vs rest builds one binary classifier per class and picks the highest score, a cheap multiclass wrapper. One vs one trains pairwise models instead, at higher cost.