Predicting with Questions - Decision Trees

A Decision Tree is a machine learning algorithm that classifies or predicts data by asking a sequence of questions.

Much like the game of Twenty Questions, it reaches a final conclusion through multiple conditions.

For example, imagine predicting whether a patient has a specific disease.

Do you have a fever? → Yes → Do you have a cough? → Yes → High likelihood of flu
Do you have a fever? → No → High likelihood of allergy

As shown above, a decision tree classifies data by following a logical sequence of questions and answers.

Structure of a Decision Tree

A Decision Tree consists of Nodes and Branches.

Root Node: The initial question at the top of the tree
Internal Node: Intermediate question
Leaf Node: Final outcome

Decision Tree Example
      (Do you have a fever?)
         /        \
       Yes        No
      /             \
 (Do you have a cough?)    Allergy
   /         \
  Flu     Common Cold

A Decision Tree automatically determines how to split the data by selecting the most informative questions.

Learning Method of Decision Trees

Decision Trees learn by applying various conditions to split the data into smaller, more specific groups.

The tree uses either Information Gain or Gini Impuritymmethods to determine the best way to split the data.

1. Information Gain

Information Gain evaluates how much uncertainty is reduced after data is split.

For example, if the question "Do you have a fever?" allows you to classify data more clearly than before the split, then the information gain is considered high.

2. Gini Impurity

Gini Impurity indicates how mixed the data is.

A value of 0 means it's completely split into a single class, while a higher value implies that multiple classes are mixed together.

Decision Trees learn in the direction that minimizes Gini Impurity.

Advantages and Limitations of Decision Trees

Decision Trees are intuitive and easy-to-understand machine learning algorithms.

However, like any algorithm, it has drawbacks as well as advantages.

Let's summarize the main considerations when using Decision Trees.

Advantages of Decision Trees

Decision Trees require minimal preprocessing and are capable of handling both categorical data (classification) and numerical data (regression).

It can handle both numerical data and categorical data such as “male/female” or “spam/normal” effectively.

Limitations of Decision Trees

If a decision tree becomes too deep, it might overfit the training data and not perform well on new data.

To prevent this, techniques like Pruning can be used to remove unnecessary branches.

Moreover, as the amount of data increases, finding the optimal split can involve numerous computations, significantly slowing down the process.

Decision Trees are powerful, intuitive algorithms but come with drawbacks like overfitting issues and sensitivity to data changes.

In the next lesson, we will tackle a simple quiz using the concepts we've learned so far.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Structure of a Decision Tree​

Learning Method of Decision Trees​

1. Information Gain​

2. Gini Impurity​

Advantages and Limitations of Decision Trees​

Advantages of Decision Trees​

Limitations of Decision Trees​