Confusion Matrix and Classification Report
When working with classification models, accuracy alone isn’t always enough to judge performance — especially if your dataset is imbalanced (e.g., predicting rare diseases).
Two useful tools for deeper analysis are:
- Confusion Matrix – A table showing correct and incorrect predictions for each class.
- Classification Report – Provides precision, recall, F1-score, and support for each class.
Why Use Them?
- Confusion Matrix reveals where your model is making mistakes.
- Precision tells you how many predicted positives were correct.
- Recall tells you how many actual positives were correctly identified.
- F1-score balances precision and recall into a single number.
Example
Confusion Matrix and Report Example
# Install scikit-learn in Jupyter Lite
import piplite
await piplite.install('scikit-learn')
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, classification_report
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Train a KNN model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)
# Classification Report
print("\nClassification Report:\n", classification_report(y_test, y_pred))
Key Takeaways
- Use confusion matrices to visualize misclassifications.
- Precision and recall help understand performance beyond accuracy.
- The F1-score is especially useful for imbalanced datasets.
What’s Next?
In the next lesson, we’ll introduce K-Means clustering as our first unsupervised learning algorithm.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.