The problem
Some data is not separable by a straight line. We could map it into a much higher dimensional space where it becomes separable, but computing those huge feature vectors is expensive or impossible.
The trick
Many algorithms, like SVMs, only ever use the dot product between pairs of points, not the points themselves. A kernel is a function that returns the dot product in the high dimensional space directly from the original inputs.
- We get the geometry of the rich space without building the features.
- This is cheaper and sometimes lets the space be infinite dimensional.
Common kernels
- The linear kernel is just the plain dot product.
- The polynomial kernel adds interaction terms up to a chosen degree.
- The radial basis function kernel measures similarity by distance and is a strong default.
What to remember
- A valid kernel must correspond to some real inner product, a property called being positive semidefinite.
- Kernels let simple linear methods carve curved boundaries.
- The cost shifts to comparing pairs of points, which can be slow on huge datasets.
Key idea
The kernel trick replaces explicit high dimensional features with a function that returns their dot product, giving curved boundaries cheaply.