I was learning about GANs when the term "Label Smoothing" surfaced. In the video tutorial that I watched, they use the term "label smoothing" to change the binary labels when calculating the loss of the discriminator network. Instead of using 1, they use 0.9 for the label. What is the main purpose of this label smoothing?
I've skimmed through the original paper, and there is a lot of maths that, honestly, I have difficulty understanding. But I notice this paragraph in there:
We propose a mechanism for encouraging the model to be less confident. While this may not be desired if the goal is to maximize the log-likelihood of training labels, it does regularize the model and makes it more adaptable
And it gives me another question:
why "this may not be desired if the goal is to maximize the log-likelihood of training labels"?
what do they mean by "adaptable"?