Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Standardizing Data Size

In this lesson, we will learn how to adjust the size of data using Standardization.


What is Standardization?

Standardization is a method of transforming data so that its mean becomes 0 and its standard deviation (the degree of data spread) becomes 1.

Mean refers to the central value of data points, while Standard Deviation indicates how much the data points are spread out from the mean.

Standardized data has a consistent distribution centered around the mean, and is less sensitive to outliers.

An outlier is an extreme value that significantly differs from other values in a dataset. For example, a student with a height of 7 feet can be considered an outlier.


How to Calculate Standard Deviation

To standardize, we first need to calculate the standard deviation (σ\sigma).

The standard deviation is calculated using the following formula:

σ=1Ni=1N(xiμ)2\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}
  • NN : Number of data points
  • xix_i : Individual data value
  • μ\mu : Mean

For example, consider a dataset of heights: 5.3, 5.9, and 5.11 feet. The standard deviation is calculated as:

σ=13[(5.35.9)2+(5.95.9)2+(5.115.9)2]2.7\sigma = \sqrt{\frac{1}{3} [(5.3-5.9)^2 + (5.9-5.9)^2 + (5.11-5.9)^2]} \approx 2.7

Standardizing Students' Height (in) and Weight (lb)

Standardization is performed using the following formula:

Standardization Formula
New Value = (Original Value - Mean) / Standard Deviation

Let's standardize the following dataset:

Height (in)Weight (lb)
63121
67132
71143

1. Calculate the Mean and Standard Deviation for Height

  • Mean: (63 + 67 + 71) / 3 = 67

  • Standard Deviation: Indicates the spread of data from the mean ≈ 3.27


2. Apply Standardization to Height Data

Each value is standardized as shown below:

Height Standardization Example
(63 - 67) / 3.27 ≈ -1.22
(67 - 67) / 3.27 = 0
(71 - 67) / 3.27 ≈ 1.22

The transformed results are:

Original Height (in)Standardized Height
63-1.22
670.00
711.22

3. Apply Standardization to Weight Data

  • Mean: (121 + 132 + 143) / 3 = 132
  • Standard Deviation ≈ 11.0
Weight Standardization Example
(121 - 132) / 11.0 ≈ -1.00
(132 - 132) / 11.0 = 0
(143 - 132) / 11.0 ≈ 1.00

The transformed results:

Original Weight (lb)Standardized Weight
121-1.00
1320.00
1431.00

The standardized results for both height and weight together are:

Height (in)Weight (lb)
-1.22-1.00
0.000.00
1.221.00

By standardizing the height and weight data in this manner, you ensure that the scales of the two datasets are fairly represented during AI model training.

In the next lesson, we will explore the differences between normalization and standardization and learn about the situations each is best suited for.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.