Hyperparameters in Determining How to Train AI

In this lesson, we will review the hyperparameters that we learned about in previous lessons.

Just like when you make a study plan for preparing for an exam, deciding the study time, rest periods, and study methods can significantly affect your grades.

Similarly, when training an AI model, you set hyperparameters to determine how the model learns.

Hyperparameters are parameters set when training an AI model, which are input values that affect how the system operates or the results, and must be set before training begins.

A well-trained model with appropriate hyperparameters that achieves good results on various problems is said to have achieved convergence.

On the other hand, a model that performs poorly due to insufficient training is said to have divergence.

Just as cramming too much study in a short period can lead to confusion and lower exam scores, an optimized learning strategy is essential when training AI.

The main hyperparameters that make up an AI learning strategy are as follows:

Learning Rate

The learning rate determines how much the model will change with each iteration, similar to the speed at which a student learns new information.

If the student tries to learn too quickly (high learning rate), they might not fully understand the information, and if they learn too slowly (low learning rate), learning can be slow and inefficient.

Similarly, if the learning rate is too high, the model may overshoot the optimal solution or become unstable, and if it's too low, the learning speed can be slow.

Batch Size

The batch size determines the amount of data the model processes at once, comparable to how much a student studies at one time.

Studying too much at once (large batch size) might reduce total study time but can affect concentration, while studying too little at once (small batch size) can prolong total study time.

Similarly, a large batch size can increase learning speed but requires more computing resources, whereas a small batch size uses fewer resources but may take longer to train.

Number of Epochs

An epoch determines how many times the entire dataset will be repeated during training. It's comparable to how many times a student goes through the entire textbook.

Going through the textbook multiple times (many epochs) can lead to better understanding but might also cause over-reliance on the textbook's knowledge. On the other hand, going over it too few times (few epochs) might lead to insufficient understanding.

Similarly, too many epochs can result in overfitting, while too few can result in inadequate learning.

Since hyperparameters greatly influence the model's learning process and performance, they should be set and adjusted thoughtfully, much like a person's learning approach.

In the next lesson, we will simulate the fine-tuning process.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Learning Rate​

Batch Size​

Number of Epochs​

Want to learn more?

Learning Rate

Batch Size

Number of Epochs