Tuning Hyperparameters
In a practical setting, you can optimize the training process by adjusting batch size
, learning rate
, and number of epochs
.
Revisiting Hyperparameters
Hyperparameter | Description |
---|---|
Batch Size | Amount of data processed by the model at once |
Learning Rate | Speed at which the model learns |
Number of Epochs | Number of times the entire dataset is learned |
Learning Rate
The speed at which the model updates its weights is determined by this hyperparameter. If the learning rate is too high, the model may 'overshoot' the optimal solution and diverge, while a rate that is too low may prevent finding an optimal solution or may take too long to converge.
Learning rates are typically small values between 0 and 1, such as 0.1, 0.01, 0.001, and 0.0001.
Batch Size
This refers to the amount of data used for one training step. For example, if the batch size is 32, the model is trained on 32 data points at a time.
Commonly used batch sizes are powers of two, such as 16, 32, 64, 128, 256, 512. A larger batch size can speed up the training process but may increase memory usage.
Number of Epochs
One epoch equals training the entire dataset once. This means the model goes through all the Training data once per epoch.
If the number of epochs is too few, the model may not learn enough and perform poorly, whereas too many epochs may cause the model to memorize the training data, leading to poor performance on new data.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.