← Lessons

quiz vs the machine

Platinum1720

Machine Learning

The Kernel Trick

Get the power of high dimensional features without ever computing them.

5 min read · advanced · beat Platinum to climb

The problem

Some data is not separable by a straight line. We could map it into a much higher dimensional space where it becomes separable, but computing those huge feature vectors is expensive or impossible.

The trick

Many algorithms, like SVMs, only ever use the dot product between pairs of points, not the points themselves. A kernel is a function that returns the dot product in the high dimensional space directly from the original inputs.

  • We get the geometry of the rich space without building the features.
  • This is cheaper and sometimes lets the space be infinite dimensional.

Common kernels

  • The linear kernel is just the plain dot product.
  • The polynomial kernel adds interaction terms up to a chosen degree.
  • The radial basis function kernel measures similarity by distance and is a strong default.

What to remember

  • A valid kernel must correspond to some real inner product, a property called being positive semidefinite.
  • Kernels let simple linear methods carve curved boundaries.
  • The cost shifts to comparing pairs of points, which can be slow on huge datasets.

Key idea

The kernel trick replaces explicit high dimensional features with a function that returns their dot product, giving curved boundaries cheaply.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the kernel trick avoid computing?

2. Which property must a valid kernel satisfy?