Feature Scaling and Preprocessing
In machine learning, feature scaling and preprocessing make sure all features contribute equally to the model, and the data is in the right format for learning.
Without scaling, models like KNN
or gradient descent-based algorithms
can be biased toward features with larger numeric ranges.
Common Preprocessing Steps
Feature Scaling
: Normalize or standardize values so they're on a similar scale.Encoding Categorical Variables
: Convert text labels into numbers.Handling Missing Values
: Replace or remove nulls.Feature Transformation
: Apply mathematical transformations (log, polynomial, etc.).
Example: Standardization and Normalization
The following example shows how to scale features in Scikit-learn.
Scaling Features in Scikit-learn
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler
# Example dataset
X = np.array([[1.0, 200.0],
[2.0, 300.0],
[3.0, 400.0]])
# Standardization (mean=0, std=1)
scaler_std = StandardScaler()
X_std = scaler_std.fit_transform(X)
# Normalization (range [0, 1])
scaler_mm = MinMaxScaler()
X_mm = scaler_mm.fit_transform(X)
print("Standardized Data:", X_std)
print("Min-Max Scaled Data:", X_mm)
Choosing the Right Scaling Method
The following are the two most common scaling methods:
Standardization
: Best for algorithms assuming Gaussian-like distributions (e.g., logistic regression, SVM).Normalization
: Best for distance-based models (e.g., KNN, neural networks).
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.