1

The description of feature selection based on a random forest uses trees without pruning. Do I need to use tree pruning? The thing is, if I don't cut the trees, the forest will retrain.

Below in the picture is the importance of features based on 500 trees without pruning. enter image description here

With a depth of 3. enter image description here

I always use the last four signs 27, 28, 29, 30. And I try to add to them signs from 0 to 26 by means of cycles, going through possible combinations. Empirically, I assume that the trait number 0, 26 is significant. But, on both pictures it is not visible. Although the quality of classification with the addition of 0, 26 has improved.

user287629
  • 45
  • 4

1 Answers1

0

random forest's feature importances are not reliable and you should probably avoid them. Instead you can use permutation_importance: https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance.html#sphx-glr-auto-examples-inspection-plot-permutation-importance-py

adrin
  • 124
  • 1
    "random forest's feature importances are not reliable and you should probably avoid them" is a very strong statement which does not hold. While random forest variable importance estimates are known to be biased under certain conditions (e.g. gini impurity-based estimates on a mix of continuous and categorical variables), they can provide relatively good estimates in other situations. – Jonathan Jan 16 '20 at 20:31
  • @adrin thanks. I want to get the features values in "permutation_importance". As in "importances = clf.feature_importances_", to build a graph . If I request"result.importances", I get a lot of items on each index. It turns out to get only indexes. Which function should I turn to? – user287629 Jan 16 '20 at 22:18