Classification with K-Nearest Neighbors
K-Nearest Neighbors (KNN) is one of the simplest machine learning algorithms for classification.
It classifies a data point based on the majority class of its nearest neighbors in the training set.
How KNN Works
- Store the entire training dataset.
- For a new data point:
- Calculate the distance to all training samples (commonly Euclidean distance).
- Select the
k
closest neighbors. - Assign the most common class among those neighbors.
Example: KNN on the Iris Dataset
KNN Classification Example
# Install scikit-learn in Jupyter Lite
import piplite
await piplite.install('scikit-learn')
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create KNN model
knn = KNeighborsClassifier(n_neighbors=3)
# Train the model
knn.fit(X_train, y_train)
# Predict
y_pred = knn.predict(X_test)
# Evaluate
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))
Choosing the Value of k
- Small
k
→ More flexible, but sensitive to noise. - Large
k
→ Smoother decision boundaries, but may underfit.
A good approach is to try different k-values and choose the one that gives the best validation accuracy.
Key Takeaways
- KNN is non-parametric — no explicit model training beyond storing the data.
- Works well for small datasets, but can be slow for large datasets.
- Scaling features is important for distance-based algorithms.
What’s Next?
In the next lesson, we’ll explore Regression with Linear Models.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.