0

The deep ensemble paper https://arxiv.org/pdf/1612.01474.pdf introduces proper scoring rules for ensembles of NNs. Turns out that the likelihood is always a proper scoring rule. For regression tasks, we can then use the gaussian NLL, which includes information about the output variance of the network and optimises the quadratic term while still accounting for that. The advantage here is quite clear to me.

But I don't understand how deep ensembles are different than just regular ensembles for classification tasks. In either, each NN is still trained independently on the binary cross entropy loss, and then the prediction is averaged. So (apart from the adversarial training introduced in the paper above), what's different between the two?

0 Answers0