Classification with K-Nearest Neighbors
K-Nearest Neighbors (KNN) is one of the simplest and most intuitive machine learning algorithms for classification tasks.
It predicts the class of a new data point by looking at the majority label among its nearest neighbors in the training data.
How KNN Works
- Store the entire training dataset.
- For a new data point:
- Calculate the distance to all training samples (commonly Euclidean distance).
- Select the
kclosest neighbors. - Assign the most common class among those neighbors.
Example: KNN on the Iris Dataset
Let’s use Scikit-learn to apply KNN classification to the classic Iris dataset.
KNN Classification Example
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create KNN model
knn = KNeighborsClassifier(n_neighbors=3)
# Train the model
knn.fit(X_train, y_train)
# Predict
y_pred = knn.predict(X_test)
# Evaluate
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("Classification Report:", classification_report(y_test, y_pred))
Choosing the Value of k
- Small
k→ captures local patterns but may be sensitive to noise. - Large
k→ produces smoother decision boundaries but may underfit.
A good approach is to try different k-values and choose the one that gives the best validation accuracy.
Key Takeaways
KNNis non-parametric — it doesn’t assume an underlying data distribution and requires no explicit model training.- Performs well on small to medium datasets, but can be computationally expensive for large ones.
- Always scale your features — KNN relies on distance, so unscaled data can distort results.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.