Just Remember Three: Learning Rate, Batch Size, Epochs
You don’t need to remember every hyperparameter introduced earlier. In most cases, when fine-tuning, just remember learning rate
, batch size
, and number of epochs
.
Even when fine-tuning AI models on the OpenAI platform, these are the only three hyperparameters you need to set.
Review of the Three Key Hyperparameters
1. Learning Rate
The learning rate determines the speed at which the AI model learns.
When a model learns from data, it adjusts weights
, which represent how important each element is for making predictions.
For example, a model predicting house prices takes inputs like the area, location, and number of rooms. The weights for these inputs reflect their importance in determining the price.
The learning rate decides how quickly or slowly the weights are adjusted as the model learns. A high learning rate means a large adjustment, while a low learning rate means a small adjustment.
-
Low Learning Rate: Learning proceeds slowly and the model requires more time to find the optimal weights. However, it’s less likely to deviate significantly from the correct path, allowing for stable learning.
-
High Learning Rate: Learning proceeds quickly but might overshoot the optimal weights or become unstable, with a higher risk of deviating significantly from the correct path.
2. Batch Size
Batch size refers to the amount of data the model learns from at once. Breaking data into smaller batches helps manage computer memory usage and increase learning speed.
-
Small Batch Size: The model updates weights frequently and uses fewer computing resources (memory). Learning can be finer but may proceed slowly.
-
Large Batch Size: The model updates weights less frequently and uses more computing resources. Learning can proceed quickly but might lead to overfitting, where performance on new data decreases.
3. Number of Epochs
The number of epochs indicates how many times the model will go through the entire dataset. One epoch means the model has gone through all the training data once.
Having more epochs allows the model to learn better by repeated exposure to the data, but too many repetitions might lead to the model memorizing the data rather than understanding it, a situation known as overfitting.
-
Few Epochs: The model may not learn enough, resulting in reduced performance.
-
Many Epochs: The model may overfit by learning too much from the training data, causing decreased performance on new, unseen data.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.