I am looking to try different loss functions for a hierarchical multi-label classification problem. So far, I have been training different models or submodels like multilayer perceptron (MLP) branch inside a bigger model which deals with different levels of classification, yielding a binary vector. I have been also using Binary Cross-Entropy (BCE) and summing all the losses existing in the model before backpropagating.
I am considering trying other losses like MultiLabelSoftMarginLoss and MultiLabelMarginLoss.
What other loss functions are worth trying? Hamming loss perhaps or a variation? Is it better to sum all the losses and backpropagate or do multiple backpropagations?