Why does PCA work well while the total variance retained is small?

Question

I'm learning machine learning by looking through other people's kernel on Kaggle, specifically this Mushroom Classification kernel.

The author first applied PCA to the transformed indicator matrix. He only used 2 principal components for visualization later. Then I checked how much variance it has maintained, and found out that only 16% variance is maintained.

in [18]: pca.explained_variance_ratio_.cumsum()
out[18]: array([0.09412961, 0.16600686])

But the test result with 90% accuracy suggests it works well.

If variance stands for information, then how can the ML model work well when so much information is lost?

OmG · Answer 1 · 2019-01-05T10:43:41.540

0

Because it selects both Xtrain and Xtest from the space of two selected principal components. Hence, the 90% accuracy is in that 2-D selected space.

This fact that the ratio in PCA stands the information, depends on the distribution of the data and it's not true at all.

edited Jan 05 '19 at 10:43

answered Jan 05 '19 at 10:36

OmG

1,731
10
19

Why does PCA work well while the total variance retained is small?

1 Answers1