1

I have the model which has 3 outputs (it is a regression task, I have the angle of the steering wheel, brake and acceleration). I can divide my values to some smaller bins and in this way I can change this into classification problem. I can balance data to have the same number of data points in each bin.

But now I wonder how to balance this data correctly. I found some good resources and libraries imbalanced-learn | Python official documentation
multi-imbalance | Python official documentation
Multi-imbalance | Poznan University of Technology

But to my understanding, these algorithms can deal with imbalanced data (in normal and multi class classification) only if you have one output. But I have 3 outputs. And these outputs can be correlated somehow. How to balance them correctly?

I thought about 2 ideas:

  1. Creating tuples consist of 3 elements and balancing in such a way that you have the same number of different tuples But you can have this situation: (A, X, 1), (A, Y, 2), (A, Y, 3), (B, Z, 3) These tuples are different, but you can see that we have a lot of tuples with the value A at first position. So the data is still quite imbalanced.

  2. Balancing data iteratively considering only one column at a time. You balance first column, then you balance second column etc.

Are these ideas good or not? Maybe there are some other options for balancing data if you have multiple targets?

Shayan Shafiq
  • 350
  • 1
  • 4
  • 12

1 Answers1

0

You can try weighting your training data instances. So, if for example class A has proportion $p_A$, you weight every instance of class A with $1/p_A$. There also exists more sophisticated approaches to train on unbalanced data, such as generating synthetic samples to create a balanced dataset and so on. You can start learning more here.

SpiderRico
  • 960
  • 8
  • 18
  • These are reasonable approaches. In this specific example, instead of weighting based the proportion of the class, you would weight based on the proportion of each tuple. – Snehal Patel Dec 25 '22 at 04:24