I'm learning machine learning by looking through other people's kernel on Kaggle, specifically this Mushroom Classification kernel.
The author first applied PCA to the transformed indicator matrix. He only used 2 principal components for visualization later. Then I checked how much variance it has maintained, and found out that only 16% variance is maintained.
in [18]: pca.explained_variance_ratio_.cumsum()
out[18]: array([0.09412961, 0.16600686])
But the test result with 90% accuracy suggests it works well.
If variance stands for information, then how can the ML model work well when so much information is lost?