How to deal with losses on different scales in multi-task learning?

Question

Say I'm training a model for multiple tasks by trying to minimize sum of losses $L_1 + L_2$ via gradient descent.

If these losses are on a different scale, the one whose range is greater will dominate the optimization. I'm currently trying to fix this problem by introducing a hyperparameter $\lambda$, and trying to bring these losses to the same scale by tuning it, i.e., I try to minimize $L_1 +\lambda \cdot L_2$ where $\lambda > 0 $.

However, I'm not sure if this is a good approach. In short, what are some strategies to deal with losses having different scales when doing multi-task learning? I'm particularly interested in deep learning scenarios.

I've seen multiple times the approach that you're currently using in different contexts, such as Bayesian neural networks and multi-objective optimization with evolutionary algorithms, so your approach is at least not new and may work for your purposes. Off the top of my head, no other completely different approach comes to my mind, but it's possible there are others. So, I think the main question/problem is not whether your approach is good or not, but how to scale the losses. Maybe this Wikipedia article: https://en.wikipedia.org/wiki/Multi-objective_optimization can be useful. — nbro, Mar 18 '21 at 13:18

score 1 · Answer 1 · answered Nov 25 '21 at 19:22

1

I am currently working on a similar problem. I think your approach is good. As for setting the parameter lambda, since you are using deep neural networks, you can make it a learnable parameter, instead of a hyperparameter you set. This way, as the two losses fluctuate over your training iterations/epochs, the model will be able to adjust the lambda parameter accordingly.

answered Nov 25 '21 at 19:22

rachkov91

11
1

3

This will not work as a parameter. The network will just set the harder task to zero. Must be a hyper param – chessprogrammer Aug 14 '22 at 13:42

How to deal with losses on different scales in multi-task learning?

1 Answers1