When approaching a machine learning mission, have you ever felt dizzy by a large number of features ?

Most data scientists experience this overwhelming challenge. While adding data enrichment features, it often slows down the training process and makes it more difficult to detect hidden samples , resulting in a famous curse of space (in) .

Moreover, in the surprising height of the phenomenon height. To describe this concept with a similarity, think about Flatland novel, where the characters live in a flat (2 -way) world that feels dizzy when they encounter a 3 -dimensional creature. In the same way, we struggle to understand that, in the height space, most points are exceptions and the distance between points is usually greater than expected. All these phenomena, if not exactly handled, can be miserable for our machine learning models.

In this post, I will explain some advanced reduced techniques used to minimize this problem.

In my previous article, I introduced the relevance of reducing the size in machine learning issues and how to tame the curse of the afternoon , and I explained both the theory and implementation of Scikit-El. The main component analysis algorithm.

Following this, I will go into the algorithm to reduce the additional size, such as KPCA or LLE , overcoming the limitations of PCA.

Do not worry if you haven't read the introduction to reduce my size. This post is an independent guide because I will detail each concept in simple terms. However, if you want to know more about PCA, I actively guide this will serve your goals:

Users who liked