Feature scaling, in general, is an important stage in the data preprocessing pipeline.
Decision Tree and Random Forest algorithms, though, are scale-invariant - i.e. they work fine without feature scaling. Why is that?
Feature scaling, in general, is an important stage in the data preprocessing pipeline.
Decision Tree and Random Forest algorithms, though, are scale-invariant - i.e. they work fine without feature scaling. Why is that?
Scaling only makes sense when there is something that reacts to that scale. Decision Trees though, just make a cut at a certain number.
Imagine: For a feature that goes from 0 to 100 a cut at 50 may be improving performance. Scaling this down to 0 to 1 making the cut a 0.5 doesn't change a thing.
Now on the other hand NN have some kind of activation function (leaving RELu aside) that react differently to input that is above 1. Here Normalization, putting every feature between 0 and 1 makes sense.
Feature scaling happens to be a problem when a model is characterized by having a distance metric (or another kind of numerical evaluation for that matter). Therefore models such as support vector machines, neural networks, distance based clustering methods (e.g. k means) and linear/logistic regression are prone to changes by feature scaling.
Those which are based on probability rather than distances are not scale variant. These include Naive Bayes Classifiers, or decision trees.