Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Feature Selection and Dimensionality Reduction

Simplifying Feature Selection and Dimensionality Reduction

When working with data in machine learning or AI, the datasets we use often have many features (characteristics).

Consider, for example, predicting whether a student will be admitted to college.

Student data might include features such as:

  • Grades (Math, Science, History scores)
  • Attendance rate
  • Extracurricular activities
  • Athletic ability
  • Social media usage time
  • Reading volume

But do all of these factors really influence college admissions?

Athletic ability or social media usage time might not significantly impact acceptance predictions.

Thus, when training AI models, we can employ Feature Selection, which involves choosing only the important features, or Dimensionality Reduction, which combines multiple features into a smaller set of new features.


1. Feature Selection

Feature selection is the process of selecting only the important features to use.

In other words, it's about keeping the data most likely to influence college acceptance while discarding the rest.


Example of Feature Selection

  • "Math, Science, History scores" → likely important ✅
  • "Attendance rate" → could be important ✅
  • "Athletic ability" → not very important ❌
  • "Social media usage time" → might not be relevant ❌

By applying feature selection, we can retain only important information and discard unnecessary data.

This can speed up computation and improve prediction accuracy.


2. Dimensionality Reduction

Dimensionality reduction involves combining several features into a smaller number of new features.

While feature selection is about "discarding the unnecessary," dimensionality reduction is about "merging similar features."


Example of Dimensionality Reduction

"Math score + Science score + History score" → consolidate into a single "Academic Achievement" score ✅

This approach allows us to maintain as much information as possible while reducing the number of features to handle.

A popular method for this is PCA (Principal Component Analysis).


To summarize these concepts:

✅ Feature Selection: Keeping only the important (removing unnecessary features)

✅ Dimensionality Reduction: Merging similar features to reduce quantity

In the next session, we will explore Labels, which play the role of answers in supervised learning.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.