Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Classifying with the K-Nearest Neighbors Algorithm

K-Nearest Neighbors (KNN) is a machine learning algorithm that classifies new data by referring to the K closest data points around it.

Here, closeness means calculating the distance between data points by converting their features into numerical coordinates.


For example, to determine whether a fruit is an apple or a pear, you check which other fruits in the existing data it is closest to.


📌 Creating Coordinates for Fruit Data​

We can represent fruits like apples, bananas, and oranges using the following features.

FruitColor (Red=1, Yellow=2, Orange=3)Size (cm)Weight (g)
Apple18180
Banana215120
Orange310200

Now, each fruit can be represented as a 3D coordinate (x, y, z) such as (Color, Size, Weight).

  • Apple → (1, 8, 180)
  • Banana → (2, 15, 120)
  • Orange → (3, 10, 200)

If a new fruit has features (Color=1, Size=9, Weight=170), then the fruit is closest (shortest distance) to an Apple based on coordinates, thus likely an apple.

The K-Nearest Neighbors algorithm adjusts the K value, which indicates how many data points to compare, to perform classification.

For instance, if K=3, the algorithm will classify based on the closest 3 data points.

The concept of K-Nearest Neighbors is simple yet a powerful predictor, often used in supervised learning.


How the K-Nearest Neighbors Works​

The KNN algorithm operates in the following sequence.


1. Calculate Distance When New Data is Entered​

When new data is provided, it calculates the distance from existing data to find the closest K points.

The most commonly used distance measure is the Euclidean distance.

Below is the Euclidean distance formula for calculating the distance between two points in 2D (x, y) space.

Distance=(x1−x2)2+(y1−y2)2\text{Distance} = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}

Based on this distance, the K-Nearest Neighbors algorithm selects the closest K data points.


2. Majority Vote​

Among the K neighbors, the class that appears the most is predicted as the class of the new data.

For example, if K=5, the final prediction is made by choosing the class with the most occurrences among the 5 closest data points.

KNN Example
New data: ?
Closest data based on K=5: [Apple, Pear, Apple, Apple, Pear]
Result: 3 Apples, 2 Pears → Classified as Apple

This method allows the K-Nearest Neighbors algorithm to classify data in a simple yet intuitive way.


Pros and Cons of K-Nearest Neighbors​

The K-Nearest Neighbors algorithm can immediately reflect new data and is relatively robust to outliers.

Moreover, it can be used for both Classification and Regression tasks.

However, comparing new data with every existing data point can become computationally intensive, slowing the process as data increases.

Additionally, the selection of the K value greatly impacts the outcome, and performance may suffer if the data is not densely packed.


In the next lesson, we will explore Support Vector Machine (SVM).

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.