Primers • Kernel Trick
Kernel/Polynomial Trick

In classic mathematical induction, the goal of a classification task is to learn a function \(y = f(x; \theta)\) aka fit a model (or a curve) to the data with a decision boundary that separates the classes (say, approve/reject loan applications).

The diagram below from Prithvi Da summarizes the approaches.

Fig A: If the data points are linearly separable, it’s simple you can fit a linear model using algorithms like Logistic regression or SVM.

Fig B: What do you do if the data points are NOT linearly separable? of course we can fit a Neural network which in theory can learn any Borel measurable function under the sky. But we have a simple solution especially if you have a smaller tabular dataset.

Fig C: Shows you can simply magically transform a line to a parabola and then learn a function that linearly separates the data points. This idea of applying a nonlinear transformation using a special function called a “kernel” to convert linear input space (\(y= mx+c\), 1stdegree polynomial) to a high degree polynomial (\(x^2 = 4ay\), 2nddegree polynomial) is the meat of the kernel trick. Now learning a linear model in the higher dimensional space is equivalent to learning a nonlinear model in the original input space.

TL;DR  It pays in spades to bend the problem, literally sometimes.
References
Citation
If you found our work useful, please cite it as:
@article{Chadha2020DistilledKernelTrick,
title = {Kernel Trick},
author = {Chadha, Aman},
journal = {Distilled AI},
year = {2020},
note = {\url{https://aman.ai}}
}