What is Cross-Validation?
Cross-validation is a model evaluation technique that tests how well a model generalizes to unseen data.
Instead of using a single train-test split, cross-validation divides the dataset into multiple folds, training and testing the model several times on different subsets.
In k-fold cross-validation:
- The data is divided into k folds.
- For each fold:
- Train the model on k-1 folds.
- Test it on the remaining fold.
- Average the results to get a more reliable performance estimate.
Common Cross-Validation Types
K-Fold Cross-Validation: Most common, splits into k equal folds.Stratified K-Fold: Maintains class proportions in each fold (important for classification).Leave-One-Out (LOO): Each observation is tested individually.ShuffleSplit: Random splits with replacement.
Example: Comparing Models with Cross-Validation
In this example, both models are evaluated using 5-fold cross-validation, and the one with the higher average accuracy is considered better.
Cross-Validation Example
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Define models
log_reg = LogisticRegression(max_iter=200)
knn = KNeighborsClassifier(n_neighbors=5)
# Cross-validation
log_scores = cross_val_score(log_reg, X, y, cv=5)
knn_scores = cross_val_score(knn, X, y, cv=5)
print(f"Logistic Regression mean score: {log_scores.mean():.3f}")
print(f"KNN mean score: {knn_scores.mean():.3f}")
This example uses
5-fold cross-validationto compare two models and select the one with the highest average accuracy.
Key Takeaways
- Model selection ensures the chosen model is the best fit for both accuracy and efficiency.
- Cross-validation gives a more robust estimate of real-world performance.
- Always use the same cross-validation strategy when comparing models to ensure fairness.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.