Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Utilizing Momentum Optimization with Acceleration

Momentum optimization enhances the basic gradient descent by making learning faster and more stable.

In basic gradient descent, the system moves step-by-step following the current gradient, while momentum optimization considers the influence of the previous direction to proceed in a smoother and more consistent manner.

In a physics analogy, this is similar to a ball rolling down a slope, maintaining acceleration and continuously being influenced by its previous direction.

The ball has inertia, which allows it to glide towards the optimal point (optimal weight direction) without stopping abruptly.


  • Basic Gradient Descent: Moves step-by-step (sudden changes in direction possible)

  • Momentum Optimization: Moves smoothly by considering previous movement directions

Momentum optimization helps models converge more stably in the presence of fluctuating gradients (e.g., loss functions with sharp changes).


How Momentum Optimization Works

Momentum optimization operates through the following two approaches:


1. Addition of a Velocity Variable

In basic gradient descent, weights (W) are updated directly.

Momentum optimization incorporates a velocity variable to account for the influence of previous gradients.

vt=βvt1αLWv_t = \beta v_{t-1} - \alpha \frac{\partial L}{\partial W}
  • vtv_t: Current velocity (accumulates change in gradient)
  • β\beta: Momentum coefficient (typically 0.9)
  • α\alpha: Learning rate
  • LW\frac{\partial L}{\partial W}: Current gradient

2. Weight Update

Weights are adjusted using velocity.

Wnew=Wold+vtW_{\text{new}} = W_{\text{old}} + v_t
Momentum Optimization Example
Previous Weight: 0.8
Gradient: -0.2
Previous Velocity: -0.05
Momentum Coefficient (β): 0.9
New Velocity: (0.9 * -0.05) - (0.1 * -0.2) = -0.03
New Weight: 0.8 + (-0.03) = 0.77

This adjusts for past movement directions, allowing the model to converge to the optimal value more quickly without unnecessary oscillation.


Momentum Optimization vs Basic Gradient Descent

MethodMovement MethodConvergence SpeedStability
Basic Gradient DescentMoves following gradient onlySlow, high fluctuationUnstable
Momentum OptimizationConsiders velocityFast, reduced fluctuationStable

Momentum optimization facilitates stable learning even when gradients continuously change.


Momentum optimization assists in faster and more stable learning compared to gradient descent.

In the next session, we will explore optimization techniques using the Adam Optimizer.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.