2

I'm currently working on a regression problem and I have 10 inputs/attributes.

What should I do if there are correlations between different features of the input data? Does the correlation between inputs affect the performance (e.g. accuracy) of the model?

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

5

Non-correlation does not imply independence, that is, if two features are not correlated (i.e. zero correlation), it does not mean that they are independent. But (non-zero) correlation implies dependence (see https://stats.stackexchange.com/q/113417/82135 for more details). So, if you have non-zero correlation between two features, it means they are dependent. If they are dependent, then one feature gives you information about the other and vice-versa: in a certain way, one of the two is, at least partially, redundant.

Unnecessary features might not affect the performance (e.g. the accuracy) of a model. However, if you reduce the number of features, the learning process might actually be faster.

You may want to try some dimensionality reduction technique, in order to reduce the number of features.

nbro
  • 39,006
  • 12
  • 98
  • 176