Feature Scaling and Preprocessing

In machine learning, feature scaling and data preprocessing ensure that all input features contribute equally to the model and are properly formatted for learning.

Without scaling, algorithms like KNN or gradient descent-based models can become biased toward features with larger numerical ranges.

Common Preprocessing Steps

Feature Scaling: Normalize or standardize values so they're on a similar scale.
Encoding Categorical Variables: Convert text labels into numbers.
Handling Missing Values: Replace or remove nulls.
Feature Transformation: Apply mathematical transformations (log, polynomial, etc.).

Example: Standardization and Normalization

Let’s see how to apply standardization and normalization using Scikit-learn’s preprocessing tools.

Scaling Features in Scikit-learn
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Example dataset
X = np.array([[1.0, 200.0],
              [2.0, 300.0],
              [3.0, 400.0]])

# Standardization (mean=0, std=1)
scaler_std = StandardScaler()
X_std = scaler_std.fit_transform(X)

# Normalization (range [0, 1])
scaler_mm = MinMaxScaler()
X_mm = scaler_mm.fit_transform(X)

print("Standardized Data:", X_std)
print("Min-Max Scaled Data:", X_mm)

Choosing the Right Scaling Method

The following are the two most common scaling methods:

Standardization — best for algorithms that assume normally distributed features (e.g., logistic regression, SVM).
Normalization — best for distance-based or gradient-sensitive models (e.g., KNN, neural networks).

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Common Preprocessing Steps​

Example: Standardization and Normalization​

Choosing the Right Scaling Method​

Want to learn more?

Common Preprocessing Steps

Example: Standardization and Normalization

Choosing the Right Scaling Method