Rescaling Data through Normalization
In this session, we will explore Normalization
and Standardization
, which are methods to rescale data.
What Does Data Scale Mean?
In the context of data learning, the scale of data refers not to physical units (like length or weight) but to the magnitude of the numerical values (range and distribution).
For example, let's assume we store data about students' heights in inches and weights in pounds.
-
Height:
63, 67, 71
(inches) -
Weight:
120, 132, 143
(pounds)
The height has a range of 63-71
, and the weight has a range of 120-143
.
If a particular machine learning model considers the magnitude of numerical values important, the data with a relatively larger 'height' value might have a greater impact on model training.
This means that even if weight should be a more important feature for the AI model, the larger height data might be weighted more heavily during training.
When the absolute magnitudes of numbers in datasets differ, AI models may fail to learn data fairly.
To resolve such issues, a process called scaling
is used to adjust data scales to a consistent range.
Ways to Rescale Data
Two common methods for scaling are Normalization
and Standardization
.
In this session, we'll focus on Normalization
.
1. Normalization
Normalization is a method to adjust data values to a range between 0 and 1.
It commonly uses Min-Max Scaling
, transforming data with the following formula:
New Value = (Original Value - Min Value) / (Max Value - Min Value)
For instance, if you normalize the height 67
, you get:
(67 - 63) / (71 - 63) = 4 / 8 = 0.5
Students' heights (63-71) can be converted to a 0-1 range
as follows:
Height Value (inches) | Normalized Height Value |
---|---|
63 | 0.0 |
67 | 0.5 |
71 | 1.0 |
Similarly, normalizing weights (120-143) would look like this:
Weight Value (pounds) | Normalized Weight Value |
---|---|
120 | 0.0 |
132 | 0.5 |
143 | 1.0 |
Now, both height and weight are rescaled to the same scale (0-1)
, preventing the machine learning model from being unduly influenced by specific features like height.
In the next session, we'll delve into another method for rescaling data, Standardization
.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.