Introduction to Scikit-learn
Scikit-learn (also known as sklearn
) is one of the most popular open-source Python libraries for machine learning.
It provides efficient tools for:
- Classification
- Regression
- Clustering
- Dimensionality reduction
- Model selection
- Data preprocessing
Built on top of NumPy, SciPy, and Matplotlib, Scikit-learn is designed to be simple, efficient, and accessible for both beginners and professionals.
Why Use Scikit-learn?
Here are some key reasons why Scikit-learn is a go-to library for ML:
- Comprehensive Algorithms – Includes a wide variety of supervised and unsupervised learning methods.
- Easy-to-Use API – Consistent interface across models.
- Preprocessing Tools – Built-in utilities for scaling, encoding, and transforming data.
- Model Evaluation – Ready-to-use metrics and validation tools.
- Integration – Works seamlessly with NumPy arrays and Pandas DataFrames.
Example: Training a Simple Model
Example: K-Nearest Neighbors Classification
# Install scikit-learn in Jupyter Lite
import piplite
await piplite.install('scikit-learn')
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
# Create and train model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
# Evaluate
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")
This example shows how little code is needed to:
- Load a dataset
- Split it into training and testing sets
- Train a machine learning model
- Evaluate its performance
What’s Next?
In the next lesson, we’ll explore the Machine Learning Workflow and understand the main steps from data preparation to model deployment.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.