The WGAN paper concretely proposes Algorithm 1 (cf. page 8). Now, they also state what their loss for the critic and the generator is.
When implementing the critic loss (so lines 5 and 6 of Algorithm 1), they maximize the parameters $w$ (instead of minimizing them as one would normally do) by writing $w \leftarrow w + \alpha \cdot \text{RMSProp}\left(w, g_w \right)$. Their loss seems to be $$\frac{1}{m}\sum_{i = 1}^{m}f_{w}\left(x^{\left(i\right)} \right) - \frac{1}{m}\sum_{i = 1}^{m} f_{w}\left( g_{\theta}\left( z^{\left( i\right)}\right)\right).\quad \quad (1)$$
The function $f$ is the critic, i.e. a neural network, and the way this loss is implemented in PyTorch in this youtbe video (cf. minutes 11:00 to 12:26) is as follows:
critic_real = critic(real_images)
critic_fake = critic(generator(noise))
loss_critic = -(torch.mean(critic_real) - torch.mean(critic_fake))
My question is: In my own experiments with the CelebA dataset, I found that the critic loss is negative, and that the quality of the images is better if the negative critic loss is higher instead of lower, so $-0.75$ for the critic loss resulted in better generated iamges than a critic loss of $-1.26$ e.g.
Is there an error in the implementation in the youtube video of Eq. (1) and Algorithm 1 of the WGAN paper maybe? In my opinion, the implementation in the video is correct, but I am still confused then on why I get better images when the loss is higher ...
Cheers!