Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Learning with Stability using Batch Gradient Descent

Batch Gradient Descent (BGD) is a method used in machine learning and deep learning where the model updates weights by utilizing the entire dataset in one go.

Since Batch Gradient Descent calculates the gradient using all data samples, the learning process is stable, and the loss function consistently decreases in each iteration.

However, a downside is that as the data size increases, the computational cost also rises.


The Process of Batch Gradient Descent

Batch Gradient Descent proceeds through the following steps:

  1. Calculate the loss function using the entire dataset

  2. Compute the average gradient of all samples

  3. Update the weights

  4. Repeat these steps to find the optimal values

This method helps ensure the neural network learns in a consistent direction.


How Batch Gradient Descent Works

Batch Gradient Descent follows these steps during learning:


1. Calculate the Loss Function

Using the entire dataset, calculate the difference between the model's predictions and actual values, transforming this into a loss function.

Loss Function Example
Actual values: [1.0, 2.0, 3.0]
Predicted values: [0.8, 1.9, 3.2]
Loss (MSE) = Mean((1.0-0.8)^2, (2.0-1.9)^2, (3.0-3.2)^2)

2. Calculate the Gradient

Compute the gradient for all samples, then find the average to determine the direction for minimizing the loss.

Gradient Calculation Example
Gradient for each sample:
Sample1: -0.2
Sample2: -0.1
Sample3: 0.2
Average Gradient: (-0.2 + -0.1 + 0.2) / 3 = -0.03

3. Update the Weights

Adjust the weights by multiplying the gradient by the learning rate.

Weight Update Example
Initial weight: 0.8
Gradient: -0.03
Learning rate: 0.1
New weight: 0.8 - (0.1 * -0.03) = 0.803

By repeating this process multiple times, the weights gradually converge to optimal values, improving the model's prediction accuracy.


Batch Gradient Descent vs Stochastic Gradient Descent

MethodData HandlingSpeedStability
Batch Gradient DescentUses entire datasetSlow, StableGradually converges to optimal value
Stochastic Gradient DescentUses 1 sample at a timeFast, UnstableMay fluctuate around the optimal value

Batch Gradient Descent provides a stable learning process, with the loss function steadily decreasing, though convergence is slow.

On the other hand, Stochastic Gradient Descent offers faster learning but may lead to unstable weight fluctuations.

Batch Gradient Descent is useful for situations where the dataset is not too large or when stable learning is required.

With larger datasets, the computational demand increases, so employing Mini-Batch Gradient Descent often helps maintain a balance between speed and stability.

In the next lesson, we will explore momentum optimization techniques further.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.