5

Say I'm training a model for multiple tasks by trying to minimize sum of losses $L_1 + L_2$ via gradient descent.

If these losses are on a different scale, the one whose range is greater will dominate the optimization. I'm currently trying to fix this problem by introducing a hyperparameter $\lambda$, and trying to bring these losses to the same scale by tuning it, i.e., I try to minimize $L_1 +\lambda \cdot L_2$ where $\lambda > 0 $.

However, I'm not sure if this is a good approach. In short, what are some strategies to deal with losses having different scales when doing multi-task learning? I'm particularly interested in deep learning scenarios.

nbro
  • 39,006
  • 12
  • 98
  • 176
SpiderRico
  • 960
  • 8
  • 18
  • 2
    I've seen multiple times the approach that you're currently using in different contexts, such as Bayesian neural networks and multi-objective optimization with evolutionary algorithms, so your approach is at least not new and may work for your purposes. Off the top of my head, no other completely different approach comes to my mind, but it's possible there are others. So, I think the main question/problem is not whether your approach is good or not, but how to scale the losses. Maybe this Wikipedia article: https://en.wikipedia.org/wiki/Multi-objective_optimization can be useful. – nbro Mar 18 '21 at 13:18

1 Answers1

1

I am currently working on a similar problem. I think your approach is good. As for setting the parameter lambda, since you are using deep neural networks, you can make it a learnable parameter, instead of a hyperparameter you set. This way, as the two losses fluctuate over your training iterations/epochs, the model will be able to adjust the lambda parameter accordingly.

rachkov91
  • 11
  • 1