Skip to main content
Practice

Introduction to Scikit-learn

Scikit-learn (imported as sklearn) is a leading open-source Python library for machine learning and data analysis.

It provides efficient tools for:

  • Classification
  • Regression
  • Clustering
  • Dimensionality reduction
  • Model selection
  • Data preprocessing

Built on top of NumPy, SciPy, and Matplotlib, Scikit-learn is designed to be simple, efficient, and accessible for both beginners and professionals.


Why Use Scikit-learn?

Here are the main reasons why Scikit-learn is a go-to library for machine learning in Python:

  • Comprehensive algorithms — includes a wide range of supervised and unsupervised learning models
  • Consistent API — uniform interface for model training and evaluation
  • Data preprocessing — built-in tools for scaling, encoding, and feature transformation
  • Model evaluation — ready-to-use metrics and validation utilities
  • Seamless integration — works natively with NumPy arrays and Pandas DataFrames

Example: Training a Simple Model

You can install Scikit-learn using the following command:

pip install scikit-learn

After installing Scikit-learn, you can import it using the following command:

import sklearn

Example: Training a Simple ML Model

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)

# Create and train model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

# Evaluate
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")

This example shows how little code is needed to:

  1. Load a dataset
  2. Split it into training and testing sets
  3. Train a machine learning model
  4. Evaluate its performance

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.