Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

The Core Principle of Neural Network Learning: Gradient Descent

Gradient Descent is an algorithm used in machine learning and deep learning for AI models to find optimal weights.

Simply put, gradient descent is akin to descending a mountain.

To quickly minimize the difference between predicted values and actual values, it finds the steepest direction (gradient) and moves step-by-step.

AI models repeat this process to optimize weights for increasingly accurate predictions.

Concept of Gradient Descent
Loss Function = Height of the Mountain
Weight Adjustment = Adjusting the Descent Direction
Gradient = Indicates how steep it is
Learning Rate = Determines how much to move in one step

How Gradient Descent Works

Gradient descent optimizes weights by repeating the following steps:


1. Calculate the Loss Function

Compute the difference between predicted values and actual values with the current weights.

Use a loss function to quantify the error.

Loss Function Example
Actual Value: 1.0, Predicted Value: 0.6
Loss (MSE) = (1.0 - 0.6)^2 = 0.16

2. Calculate the Gradient

Use the gradient obtained by differentiating the loss function to find the direction that reduces the loss function's value the fastest from the current position.

Gradient Calculation Example
Current Weight: 0.5
Gradient: -0.3 (Direction of Decrease)

3. Update the Weight

Adjust the weight based on the gradient.

Multiply by the Learning Rate (α) to determine the step size.

The formula is as follows:

New Weight=Current Weight(Learning Rate×Gradient)\text{New Weight} = \text{Current Weight} - (\text{Learning Rate} \times \text{Gradient})
Weight Update Example
Current Weight: 0.8
Gradient: -0.2
Learning Rate: 0.1
New Weight: 0.8 - (0.1 * -0.2) = 0.82

By repeating this process, weights get closer to optimal values, resulting in more accurate predictions by the neural network.


Gradient descent is a key method for neural networks to find optimal weights, with appropriate learning rate settings being crucial.

A learning rate that's too large may overshoot the optimal value, while one that's too small could slow down learning.

To address this, various gradient descent algorithms such as Stochastic Gradient Descent (SGD), Batch Gradient Descent (BGD), Momentum, and Adam are used.

In the next lesson, we will explore stochastic gradient descent in detail.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.