Classifying with the K-Nearest Neighbors Algorithm

K-Nearest Neighbors (KNN) is a machine learning algorithm that classifies new data based on the K closest data points around it.

In this context, closeness refers to the distance calculated between data points after converting their features into numerical coordinates.

For example, to determine whether a fruit is an apple or a pear, you check which other fruits in the existing data it is closest to.

📌 Creating Coordinates for Fruit Data

We can represent fruits like apples, bananas, and oranges using the following features.

Fruit	Color (Red=1, Yellow=2, Orange=3)	Size (cm)	Weight (g)
Apple	1	8	180
Banana	2	15	120
Orange	3	10	200

Now, each fruit can be represented as a 3D coordinate (x, y, z) such as (Color, Size, Weight).

Apple → (1, 8, 180)
Banana → (2, 15, 120)
Orange → (3, 10, 200)

If a new fruit has features (Color=1, Size=9, Weight=170), then the fruit is closest (shortest distance) to an Apple based on coordinates, thus likely an apple.

The K-Nearest Neighbors algorithm adjusts the K value, which indicates how many data points to compare, to perform classification.

For instance, if K=3, the algorithm will classify based on the closest 3 data points.

The concept of K-Nearest Neighbors is simple yet a powerful predictor, often used in supervised learning.

How the K-Nearest Neighbors Works

The KNN algorithm operates in the following sequence.

1. Calculate Distance When New Data is Entered

When new data is provided, it calculates the distance from existing data to find the closest K points.

The most commonly used distance measure is the Euclidean distance.

Below is the Euclidean distance formula for calculating the distance between two points in 2D (x, y) space.

\text{Distance} = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}

Based on this distance, the K-Nearest Neighbors algorithm selects the closest K data points.

2. Majority Vote

Among the K neighbors, the class that appears the most is predicted as the class of the new data.

For example, if K=5, the final prediction is made by choosing the class with the most occurrences among the 5 closest data points.

KNN Example
New data: ?
Closest data based on K=5: [Apple, Pear, Apple, Apple, Pear]
Result: 3 Apples, 2 Pears → Classified as Apple

This method allows the K-Nearest Neighbors algorithm to classify data in a simple yet intuitive way.

Pros and Cons of K-Nearest Neighbors

The K-Nearest Neighbors algorithm can immediately reflect new data and is relatively robust to outliers.

Moreover, it can be used for both Classification and Regression tasks.

However, as the dataset grows, comparing new data against all existing data points becomes computationally expensive and slows performance.

Additionally, the selection of the K value greatly impacts the outcome, and performance may suffer if the data is not densely packed.

In the next lesson, we will explore Support Vector Machine (SVM).

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

📌 Creating Coordinates for Fruit Data​

How the K-Nearest Neighbors Works​

1. Calculate Distance When New Data is Entered​

2. Majority Vote​

Pros and Cons of K-Nearest Neighbors​