If you have wondered about the Support Vector Machine classifier previously, but have been too intimidated to learn more, this explainer will get you started with the basics surrounding this machine learning algorithm.
Support Vector Machine, SVM for short, is a supervised learning classifier that aims to maximize the boundary separating two or more groups. To better understand what that exactly means, let’s look at some visualizations:
In the above image, we have two distinct groups that we want to classify. Let’s say we want to draw a line between these two groups to separate them. Does it matter which of the above three lines we choose?
You might instinctually say the middle line is the best of the three and you would be right in this case, but why? Don’t all three lines clearly separate the two groups?
Although the outside lines also separate the two groups, we must remember the fact that this is only the training data. What happens when the testing data is different? The next image perfectly illustrates this point:
As you can see, the line we draw definitely matters when it comes to the testing data! If we had chosen the line on the right instead of the middle line, our model would have been overfit to the training data and would have clearly missed the test case with the arrow pointing to it.
Using this example, we can see why the Support Vector Machine classifier strives to draw a line that maximizes the distance between the two groups in order to make sure the algorithm does not overfit to the training data.