Evaluating Classification Models

Model evaluation measures how effectively a trained model makes predictions on unseen data.

The right evaluation metric depends on the type of task:

Classification: Accuracy, Precision, Recall, F1-score
Regression: R² (coefficient of determination), MSE, MAE

True Positive, True Negative, False Positive, False Negative

When evaluating classification models, these terms are commonly used:

True Positive (TP): Correctly predicting positive cases (e.g., predicting a woman is pregnant when she actually is)
True Negative (TN): Correctly predicting negative cases (e.g., predicting a woman is not pregnant when she actually isn’t)
False Positive (FP): Incorrectly predicting positive cases (e.g., predicting a woman is pregnant when she isn’t)
False Negative (FN): Incorrectly predicting negative cases (e.g., predicting a woman is not pregnant when she actually is)

Classification Metrics

Commonly used evaluation metrics for classification models include:

Accuracy: The ratio of correct predictions to total predictions Formula: (TP + TN) / (TP + TN + FP + FN)
Precision: The proportion of positive predictions that are actually correct Formula: TP / (TP + FP)
Recall: The proportion of actual positives that are correctly identified Formula: TP / (TP + FN)
F1-score: The harmonic mean of precision and recall Formula: 2 * (precision * recall) / (precision + recall)

Scikit-learn provides built-in functions to calculate these metrics easily.

Example: Calculating Accuracy Score

Let’s evaluate a simple K-Nearest Neighbors classification model using the accuracy metric.

Accuracy Example
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset (Iris dataset)
X, y = load_iris(return_X_y=True)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Evaluate accuracy
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.2f}")

In this example, Scikit-learn’s accuracy_score() function measures how many predictions the model got right.

You can also use precision_score(), recall_score(), or f1_score() to compute other metrics depending on your model’s goals.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

True Positive, True Negative, False Positive, False Negative​

Classification Metrics​

Example: Calculating Accuracy Score​

Want to learn more?

True Positive, True Negative, False Positive, False Negative

Classification Metrics

Example: Calculating Accuracy Score