Support Vector Machine - Finding the Optimal Separating Line
Support Vector Machine (SVM)
is a machine learning algorithm that finds the best decision boundary
to separate data.
For example, suppose we need to classify emails as spam or not spam.
If, upon analyzing the email data, spam and regular emails form distinct groups, SVM looks for the best line (Hyperplane) that separates these two groups.
📌 What is the best line to separate two groups?​
SVM does more than just classify data; it finds the line that maximizes the margin
(the space between groups).
Data | Mail Type | Word Count | Domain Trust |
---|---|---|---|
A | Spam | 100 | Low |
B | Spam | 90 | Low |
C | Regular | 30 | High |
D | Regular | 40 | High |
Representing the above data as coordinates, the X-axis could be Word Count
, and the Y-axis could be Domain Trust
.
In the graph above, each element signifies:
- Red ✖ → Spam email
- Blue ✖ → Regular email
- X-axis: Word Count
- Y-axis: Domain Trust
- Bold black line → Decision Boundary (optimal line separating spam and regular emails)
- Two dashed lines → Margin (distance between the decision boundary and support vectors)
SVM finds the line that optimally separates spam from regular emails.
The data points closest to this line are referred to as Support Vectors
.
Support Vectors are the critical data points that define the decision boundary; if they change, the boundary changes.
How the Support Vector Machine Works​
The process by which SVM classifies data is as follows.
1. Finding the Hyperplane​
SVM identifies the hyperplane that best divides the data.
In 2D, this hyperplane is a line, and in 3D, it becomes a plane.
2. Maximizing the Margin​
Maximizing the distance between the hyperplane and the nearest data (Support Vectors).
This ensures more accurate classification when new data is introduced.
SVM then classifies new input data based on the decision boundary, categorizing it as spam or regular mail.
Support Vector Machines are powerful algorithms for finding clear boundaries and are used in various fields such as image classification, text classification, and more.
In the next session, we'll explore the k-means clustering
algorithm.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.