Batch Normalization to Stabilize Neural Network Training

Batch Normalization (BN) is a technique that assists in training a neural network more quickly and stably.

By normalizing the distribution of data, it reduces the problem of vanishing gradients and enhances the training speed.

Why is Batch Normalization Necessary?

As deep learning models become deeper, a problem called Internal Covariate Shift can arise, where the distribution of input data continues to change at each layer.

This occurs when the distribution of data at the earlier layers changes significantly, making it difficult for the later layers to learn stably.

To solve this problem, batch normalization is employed.

Batch normalization adjusts the mean and variance of the input data at each layer, minimizing distribution changes and helping neural networks learn more stably.

Simply put, batch normalization maintains a consistent distribution of data during training, enabling stable learning.

How Does Batch Normalization Work?

Batch normalization stabilizes learning through the following steps:

1. Normalization at the Mini-Batch Level

Calculate the mean and variance of the data (mini-batch) used in a single training session.

Utilize these to normalize the input data and ensure consistent value ranges.

2. Adjustable Weights for Learning

After simple normalization, it might be difficult for the model to have sufficient expressive power, so we add new weights and biases to retain necessary information for the model.

3. Repeating the Learning

By repeating this process at each layer, the neural network can learn more stably.

Batch normalization adjusts the input data distribution at each layer, solving the problem of extremely small neuron activation values and increasing learning speed.

In the next lesson, we'll solve a brief quiz based on what we've learned so far.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Why is Batch Normalization Necessary?​

How Does Batch Normalization Work?​

1. Normalization at the Mini-Batch Level​

2. Adjustable Weights for Learning​

3. Repeating the Learning​