Stochastic Gradient Descent, a Fast and Efficient Learning Method

Stochastic Gradient Descent (SGD) updates neural network weights by randomly selecting a single data sample at each step.

This method requires less computation and often converges faster, making it widely used for training on large datasets.

The Process of Stochastic Gradient Descent

Stochastic Gradient Descent is performed by repeating the following steps:

By repeating this process, the model gradually finds the optimal weights.

SGD proceeds with learning through the following steps:

Randomly select one sample (x, y) from the dataset and calculate the loss using the current weights.

Loss Function Example
Input data: x = 2.0, Actual value: y = 5.0
Model prediction: 4.2
Loss (MSE) = (5.0 - 4.2)^2 = 0.64

Calculate the gradient of the loss function to determine how much the current weight needs to be adjusted.

Gradient Calculation Example

Current weight: 0.5
Gradient: -0.3

Use the gradient to update the weights. Adjust the updating speed by multiplying with the Learning Rate (α).

\text{New weight} = \text{Current weight} - (\text{Learning rate} \times \text{Gradient})

Weights Update Example
Current weight: 0.8
Gradient: -0.2
Learning rate: 0.1
New weight: 0.8 - (0.1 * -0.2) = 0.82

By repeating this process over all data samples, weights are optimized.

Stochastic Gradient Descent is an essential optimization technique for rapidly learning from large datasets.

In the next lesson, we will explore Batch Gradient Descent.

Join CodeFriends Plus membership or enroll in a course to start your journey.