Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Label Encoding and One-Hot Encoding

In this lesson, we will learn about Label Encoding and One-Hot Encoding, methods for converting categorical data into numerical data during data preprocessing.


1. Label Encoding

This method converts each category into numbers.

Label Encoding Example
| Student Name | Favorite Subject | Label Encoding Value |
|--------------|------------------|----------------------|
| John | Math | 0 |
| Emily | English | 1 |
| Sarah | Science | 2 |
| Mike | Math | 0 |

Label encoding is simple and efficient as it converts data into numbers straightforwardly.

However, it can misleadingly imply that the size (order) of the numbers is meaningful when it is not.

For example, the above data might suggest that Math(0) < English(1) < Science(2) reflects an order of importance.

Training AI with this could lead to incorrect predictions.


2. One-Hot Encoding

This method creates a new "column" for each category and places a 1 in the corresponding location.

One-Hot Encoding Example
| Student Name | Favorite Subject | Math | English | Science |
|--------------|------------------|------|---------|---------|
| John | Math | 1 | 0 | 0 |
| Emily | English | 0 | 1 | 0 |
| Sarah | Science | 0 | 0 | 1 |
| Mike | Math | 1 | 0 | 0 |

One-hot encoding reduces misinterpretation arising from the size (order) of numbers by using only 0 and 1.

However, it can lead to large datasets as many new columns may be created.


Which Should You Use?

✔ Label Encoding: Not recommended for unordered data like subjects, as it can mislead due to number order.

✔ One-Hot Encoding: More suitable for unordered data, but can be inefficient if there are too many categories.


👉 Generally, if order is not important, one-hot encoding is more commonly used.

In the next lesson, we will tackle a simple quiz to review what we have learned so far.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.