Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Predicting with Questions - Decision Trees

A Decision Tree is a machine learning algorithm that finds answers by sequentially asking questions to classify or predict data.

Much like the game of Twenty Questions, it reaches a final conclusion through multiple conditions.

For example, imagine predicting whether a patient has a specific disease.

  1. Do you have a fever? → Yes → Do you have a cough? → Yes → High likelihood of flu

  2. Do you have a fever? → No → High likelihood of allergy

As seen here, a decision tree functions by following a sequence of questions and answers to classify data.


Structure of a Decision Tree

A Decision Tree consists of Nodes and Branches.

  • Root Node: The very first question asked

  • Internal Node: Intermediate question

  • Leaf Node: Final outcome

Decision Tree Example
      (Do you have a fever?)
/ \
Yes No
/ \
(Do you have a cough?) Allergy
/ \
Flu Common Cold

A Decision Tree automatically learns how to split data and creates its structure by finding the most suitable questions.


Learning Method of Decision Trees

Decision Trees learn the model by using various conditions to split data.

This process uses either Information Gain or Gini Impurity to decide how to split the data.


1. Information Gain

Information Gain evaluates how much uncertainty is reduced after data is split.

For example, if the question "Do you have a fever?" allows you to classify data more clearly than before the split, then the information gain is considered high.


2. Gini Impurity

Gini Impurity indicates how mixed the data is.

A value of 0 means it's completely split into a single class, while a higher value implies that multiple classes are mixed together.

Decision Trees learn in the direction that minimizes Gini Impurity.


Advantages and Limitations of Decision Trees

Decision Trees are intuitive and easy to understand machine learning algorithms.

However, like any algorithm, it has drawbacks as well as advantages.

Let's summarize the main considerations when using Decision Trees.


Advantages of Decision Trees

Decision Trees require minimal preprocessing and are capable of handling both categorical data (classification) and numerical data (regression).

Not only can Decision Trees handle numerical data effectively, but they can also manage categorical data such as "male/female" and "spam/normal" without issues.


Limitations of Decision Trees

If a decision tree becomes too deep, it might overfit the training data and not perform well on new data.

To prevent this, techniques like Pruning can be used to remove unnecessary branches.

Moreover, as the amount of data increases, finding the optimal split can involve numerous computations, significantly slowing down the process.


Decision Trees are powerful, intuitive algorithms but come with drawbacks like overfitting issues and sensitivity to data changes.

In the next lesson, we will tackle a simple quiz using the concepts we've learned so far.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.