Label Encoding and One-Hot Encoding
In this lesson, we will learn about Label Encoding
and One-Hot Encoding
, methods for converting categorical data into numerical data during data preprocessing.
1. Label Encoding
This method converts each category into numbers.
| Student Name | Favorite Subject | Label Encoding Value |
|--------------|------------------|----------------------|
| John | Math | 0 |
| Emily | English | 1 |
| Sarah | Science | 2 |
| Mike | Math | 0 |
Label encoding is simple and efficient as it converts data into numbers straightforwardly.
However, it can misleadingly imply that the size (order) of the numbers is meaningful when it is not.
For example, the above data might suggest that Math(0) < English(1) < Science(2)
reflects an order of importance.
Training AI with this could lead to incorrect predictions.
2. One-Hot Encoding
This method creates a new "column" for each category and places a 1 in the corresponding location.
| Student Name | Favorite Subject | Math | English | Science |
|--------------|------------------|------|---------|---------|
| John | Math | 1 | 0 | 0 |
| Emily | English | 0 | 1 | 0 |
| Sarah | Science | 0 | 0 | 1 |
| Mike | Math | 1 | 0 | 0 |
One-hot encoding reduces misinterpretation arising from the size (order) of numbers by using only 0 and 1.
However, it can lead to large datasets as many new columns may be created.
Which Should You Use?
✔ Label Encoding: Not recommended for unordered data like subjects, as it can mislead due to number order.
✔ One-Hot Encoding: More suitable for unordered data, but can be inefficient if there are too many categories.
👉 Generally, if order is not important, one-hot encoding is more commonly used.
In the next lesson, we will tackle a simple quiz to review what we have learned so far.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.