Skip to main content
Practice

Classification with K-Nearest Neighbors


K-Nearest Neighbors (KNN) is one of the simplest machine learning algorithms for classification.
It classifies a data point based on the majority class of its nearest neighbors in the training set.


How KNN Works

  1. Store the entire training dataset.
  2. For a new data point:
    • Calculate the distance to all training samples (commonly Euclidean distance).
    • Select the k closest neighbors.
    • Assign the most common class among those neighbors.

Example: KNN on the Iris Dataset

KNN Classification Example
# Install scikit-learn in Jupyter Lite
import piplite
await piplite.install('scikit-learn')

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

# Create KNN model
knn = KNeighborsClassifier(n_neighbors=3)

# Train the model
knn.fit(X_train, y_train)

# Predict
y_pred = knn.predict(X_test)

# Evaluate
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Choosing the Value of k

  • Small k → More flexible, but sensitive to noise.
  • Large k → Smoother decision boundaries, but may underfit.

A good approach is to try different k-values and choose the one that gives the best validation accuracy.


Key Takeaways

  • KNN is non-parametric — no explicit model training beyond storing the data.
  • Works well for small datasets, but can be slow for large datasets.
  • Scaling features is important for distance-based algorithms.

What’s Next?

In the next lesson, we’ll explore Regression with Linear Models.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.