Feature Selection and Dimensionality Reduction

Simplifying Feature Selection and Dimensionality Reduction

When working with data in machine learning or AI, datasets often include many features (or characteristics).

Consider, for example, predicting whether a student will be admitted to college.

A student's data might include features such as:

Grades (Math, Science, History scores)
Attendance rate
Extracurricular activities
Athletic ability
Social media usage time
Reading volume

But do all these factors really influence college admissions?

Athletic ability or social media usage time might not significantly impact acceptance predictions.

Thus, when training AI models, we can employ Feature Selection, which involves selecting only the most important features, and Dimensionality Reduction, which combines multiple features into fewer new ones.

1. Feature Selection

Feature selection is the process of selecting only the important features to use.

In other words, it's about keeping the data most likely to influence college acceptance while discarding the rest.

Example of Feature Selection

"Math, Science, History scores" → likely important ✅
"Attendance rate" → could be important ✅
"Athletic ability" → not very important ❌
"Social media usage time" → might not be relevant ❌

By applying feature selection, we can retain only important information and discard unnecessary data.

This can speed up computation and improve prediction accuracy.

2. Dimensionality Reduction

Dimensionality reduction involves combining several features into a smaller number of new features.

While feature selection is about "discarding the unnecessary," dimensionality reduction is about "merging similar features."

Example of Dimensionality Reduction

"Math score + Science score + History score" → consolidate into a single "Academic Achievement" score ✅

This approach helps retain key information while reducing the number of features to process.

A popular method for this is PCA (Principal Component Analysis).

To summarize these concepts:

✅ Feature Selection: Keeping only the important (removing unnecessary features)

✅ Dimensionality Reduction: Merging similar features to reduce quantity

In the next session, we will explore Labels, which play the role of answers in supervised learning.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Simplifying Feature Selection and Dimensionality Reduction​

1. Feature Selection​

Example of Feature Selection​

2. Dimensionality Reduction​

Example of Dimensionality Reduction​