Standardizing Data Size
In this lesson, we will learn how to adjust the size of data using Standardization
.
What is Standardization?
Standardization is a method of transforming data so that its mean becomes 0
and its standard deviation (the degree of data spread) becomes 1
.
Mean
refers to the central value of data points, whileStandard Deviation
indicates how much the data points are spread out from the mean.
Standardized data has a consistent distribution centered around the mean, and is less sensitive to outliers
.
An
outlier
is an extreme value that significantly differs from other values in a dataset. For example, a student with a height of 7 feet can be considered an outlier.
How to Calculate Standard Deviation
To standardize, we first need to calculate the standard deviation
().
The standard deviation is calculated using the following formula:
- : Number of data points
- : Individual data value
- : Mean
For example, consider a dataset of heights: 5.3, 5.9, and 5.11 feet. The standard deviation is calculated as:
Standardizing Students' Height (in) and Weight (lb)
Standardization is performed using the following formula:
New Value = (Original Value - Mean) / Standard Deviation
Let's standardize the following dataset:
Height (in) | Weight (lb) |
---|---|
63 | 121 |
67 | 132 |
71 | 143 |
1. Calculate the Mean and Standard Deviation for Height
-
Mean: (63 + 67 + 71) / 3 = 67
-
Standard Deviation: Indicates the spread of data from the mean ≈ 3.27
2. Apply Standardization to Height Data
Each value is standardized as shown below:
(63 - 67) / 3.27 ≈ -1.22
(67 - 67) / 3.27 = 0
(71 - 67) / 3.27 ≈ 1.22
The transformed results are:
Original Height (in) | Standardized Height |
---|---|
63 | -1.22 |
67 | 0.00 |
71 | 1.22 |
3. Apply Standardization to Weight Data
- Mean: (121 + 132 + 143) / 3 = 132
- Standard Deviation ≈ 11.0
(121 - 132) / 11.0 ≈ -1.00
(132 - 132) / 11.0 = 0
(143 - 132) / 11.0 ≈ 1.00
The transformed results:
Original Weight (lb) | Standardized Weight |
---|---|
121 | -1.00 |
132 | 0.00 |
143 | 1.00 |
The standardized results for both height and weight together are:
Height (in) | Weight (lb) |
---|---|
-1.22 | -1.00 |
0.00 | 0.00 |
1.22 | 1.00 |
By standardizing the height and weight data in this manner, you ensure that the scales of the two datasets are fairly represented during AI model training.
In the next lesson, we will explore the differences between normalization and standardization and learn about the situations each is best suited for.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.