5

The WGAN paper concretely proposes Algorithm 1 (cf. page 8). Now, they also state what their loss for the critic and the generator is.

When implementing the critic loss (so lines 5 and 6 of Algorithm 1), they maximize the parameters $w$ (instead of minimizing them as one would normally do) by writing $w \leftarrow w + \alpha \cdot \text{RMSProp}\left(w, g_w \right)$. Their loss seems to be $$\frac{1}{m}\sum_{i = 1}^{m}f_{w}\left(x^{\left(i\right)} \right) - \frac{1}{m}\sum_{i = 1}^{m} f_{w}\left( g_{\theta}\left( z^{\left( i\right)}\right)\right).\quad \quad (1)$$

The function $f$ is the critic, i.e. a neural network, and the way this loss is implemented in PyTorch in this youtbe video (cf. minutes 11:00 to 12:26) is as follows:

critic_real = critic(real_images)

critic_fake = critic(generator(noise))

loss_critic = -(torch.mean(critic_real) - torch.mean(critic_fake))

My question is: In my own experiments with the CelebA dataset, I found that the critic loss is negative, and that the quality of the images is better if the negative critic loss is higher instead of lower, so $-0.75$ for the critic loss resulted in better generated iamges than a critic loss of $-1.26$ e.g.

Is there an error in the implementation in the youtube video of Eq. (1) and Algorithm 1 of the WGAN paper maybe? In my opinion, the implementation in the video is correct, but I am still confused then on why I get better images when the loss is higher ...

Cheers!

nbro
  • 39,006
  • 12
  • 98
  • 176
  • 1
    Did you observe better images also with higher positive values? The Wasserstein loss is supposed to converge to 0, since it is basically the Wasserstein distance between the distribution of real images and fake generated ones. So I would say that in the example you provide it is normal to get better images with a critic loss of -.75, closer to 0 than -1.26, which tells you the discriminator is having a hard time distinguishing between fake and real images (hence better quality). – Edoardo Guerriero Dec 28 '20 at 15:46
  • Hi Edoardo, okay, I see your point. Given that $-0.75$ is closer to $0$ than $-1.26$, it is good to see that I see better images, I guess. The only thing interesting/slightly confusing me is that I haven't a positive loss for the critic yet, is that normal? – Anonymous5638 Dec 29 '20 at 20:53
  • In my experience it is possible to get negative scores using the Wasserstein loss. Again, cause rather than a usual loss the scores represent a distance between two means, that the discriminator tries to maximize. Negative scores simply means that the mean of the distribution of the generated images is bigger than the mean of the distribution of the real images. Positive scores means the other way around, but either way the game of the discriminator is just to maximize the distance, i.e. increase the absolute value, regardless of the sign in front of it. – Edoardo Guerriero Dec 29 '20 at 23:14
  • Just a last advice: looking at the generator and discriminator losses is obviously important, but in my experience only for understanding if the training is stable or not. But ultimately, what you care about is the quality of the images produced, and you should implement some specific metrics to evaluate that, and check mostly those metrics to perform hyperparameters tuning. Don't waste too much time trying to get 'good values' on the discriminator and generator losses. – Edoardo Guerriero Dec 30 '20 at 12:35

0 Answers0