Predicting with Multiple Decision Trees Using Random Forest

Random Forest is a machine learning algorithm that combines the outputs of multiple decision trees to produce more accurate and reliable predictions.

Using a single decision tree might lead to overfitting to certain data, but Random Forest enhances generalization performance by combining multiple trees.

For instance, a weather prediction model can have multiple decision trees, each learning different patterns, and combine their predictions to make the final weather forecast.

Example of Predictions by Random Forest
Decision Tree 1: Probability of rain tomorrow 60%
Decision Tree 2: Probability of rain tomorrow 70%
Decision Tree 3: Probability of rain tomorrow 65%

Final Prediction (Random Forest): Predicts a 65% probability of rain on average

In this way, Random Forest is a type of Ensemble Learning technique, which improves prediction performance by combining several models.

Ensemble Learning refers to machine learning techniques where multiple models are combined to achieve better performance than a single model.

Learning Method of Random Forest

Random Forest is trained through the following process.

1. Data Sampling

Create multiple training datasets (Subsets) by randomly selecting from the original data.

This ensures that each decision tree learns different data, enhancing the diversity of the model.

2. Training Multiple Decision Trees

Each data sample is used to independently train multiple decision trees.

During this process, each tree learns only part of the original data and determines the split criteria using randomly selected features.

3. Combining Predictions

When new data is input, all trees individually make predictions, and the final prediction is determined as follows:

For classification problems: Determine the final class using Majority Voting.
For regression problems: Calculate the average of predictions made by multiple trees to determine the final prediction.

This results in more stable and generalized predictions compared to a single decision tree.

Advantages and Limitations of Random Forest

Random Forest is a powerful and robust machine learning algorithm that helps prevent overfitting while maintaining strong predictive accuracy.

It handles not only numerical data but also categorical data effectively and reduces the impact of outliers.

Additionally, Random Forest automatically calculates the importance of variables during the learning process, making it easy to understand variable significance.

However, training and predicting with multiple decision trees can increase computational workload, and the model might be difficult to interpret intuitively.

Also, in real-time prediction scenarios, running multiple trees simultaneously might be disadvantageous speed-wise.

So far, we have explored the basic algorithms in machine learning, including linear regression, logistic regression, decision trees, and random forest.

In the next lesson, we'll try a simple quiz based on what we've learned so far.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Learning Method of Random Forest​

1. Data Sampling​

2. Training Multiple Decision Trees​

3. Combining Predictions​

Advantages and Limitations of Random Forest​