I have some trouble with the probability densities described in the original paper. My question is based on Goodfellow's paper and tutorial, respectively: Generative Adversarial Networks and NIPS 2016 Tutorial: Generative Adversarial Networks.
When Goodfellow et al. talk about probability distributions/densities in their paper, are they talking about discrete or continuous probability distributions? I don't think it's made clear.
In the continuous case, it would imply, for instance, that both $p_{data}$ and $p_g$ must be differentiable since the optimal discriminator (see Prop. 1) is essentially a function of their ratio and is assumed to be differentiable. Also, the existence of a continuous $p_g$ is non-trivial. One sufficient condition would be that $G$ is a diffeomorphism (see normalising flows), but this is rarely the case. So it seems that much stronger assumptions are needed.
In the case that the answer is discrete distributions: the differentiability of $G$ implies continuous outputs of the generator. How can this work together with a discrete distribution of its outputs? Does the answer have something to do with the fact that we can only represent a finite set of numbers with computers anyway?