Learning with Stability using Batch Gradient Descent
Batch Gradient Descent (BGD)
is a method used in machine learning and deep learning where the model updates weights by utilizing the entire dataset in one go.
Since Batch Gradient Descent calculates the gradient using all data samples, the learning process is stable, and the loss function consistently decreases in each iteration.
However, a downside is that as the data size increases, the computational cost also rises.
The Process of Batch Gradient Descent
Batch Gradient Descent proceeds through the following steps:
-
Calculate the loss function using the entire dataset
-
Compute the average gradient of all samples
-
Update the weights
-
Repeat these steps to find the optimal values
This method helps ensure the neural network learns in a consistent direction.
How Batch Gradient Descent Works
Batch Gradient Descent follows these steps during learning:
1. Calculate the Loss Function
Using the entire dataset, calculate the difference between the model's predictions and actual values, transforming this into a loss function.
Actual values: [1.0, 2.0, 3.0]
Predicted values: [0.8, 1.9, 3.2]
Loss (MSE) = Mean((1.0-0.8)^2, (2.0-1.9)^2, (3.0-3.2)^2)
2. Calculate the Gradient
Compute the gradient for all samples, then find the average to determine the direction for minimizing the loss.
Gradient for each sample:
Sample1: -0.2
Sample2: -0.1
Sample3: 0.2
Average Gradient: (-0.2 + -0.1 + 0.2) / 3 = -0.03
3. Update the Weights
Adjust the weights by multiplying the gradient by the learning rate.
Initial weight: 0.8
Gradient: -0.03
Learning rate: 0.1
New weight: 0.8 - (0.1 * -0.03) = 0.803
By repeating this process multiple times, the weights gradually converge to optimal values, improving the model's prediction accuracy.
Batch Gradient Descent vs Stochastic Gradient Descent
Method | Data Handling | Speed | Stability |
---|---|---|---|
Batch Gradient Descent | Uses entire dataset | Slow, Stable | Gradually converges to optimal value |
Stochastic Gradient Descent | Uses 1 sample at a time | Fast, Unstable | May fluctuate around the optimal value |
Batch Gradient Descent
provides a stable learning process, with the loss function steadily decreasing, though convergence is slow.
On the other hand, Stochastic Gradient Descent
offers faster learning but may lead to unstable weight fluctuations.
Batch Gradient Descent is useful for situations where the dataset is not too large or when stable learning is required.
With larger datasets, the computational demand increases, so employing Mini-Batch Gradient Descent
often helps maintain a balance between speed and stability.
In the next lesson, we will explore momentum optimization techniques further.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.