0

In the original CycleGAN paper, on the second page, there is a sentence that I didn't quite understand

In theory, this objective can induce an output distribution over $\hat{y}$ that matches the empirical distribution $p_{\text {data }}(y)$ (in general, this requires $G$ to be stochastic) [16].

What does $p_{\text {data }}(y)$ denote? Furthermore, I can't imagine the empirical distribution of it.

In the loss functions, there is also $x \sim p_{\text {data }}(x)$, but I also don't get the context there.

Could anyone please elaborate further and explain this sentence to me?

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

1

I interpret $p_{data}(y)$ as the empirical probability of seeing an image $y$ in the training data.

For example, in a typical training run, each training image is shown to the network the same number of times, so $p_{data}(y)$ is a discrete distribution with constant probability $p_{data}(y) = \frac 1 N$. Thus, in this case:

In theory, this objective can induce an output distribution over $\hat y$ that matches the empirical distribution $p_{data}(y)$.

means that training $G$ to minimize this objective can result in a function $G$ such that, if you first choose a random image $x \in X$, then calculate $G(x)$, the probability of obtaining any particular output image $y$ will also be $\frac 1 N$. That is:

$$p(\hat y = y) = E_x[p(G(x) = y)] = \frac 1 N$$

Lee Reeves
  • 491
  • 1
  • 5
  • Your response is absolutely right. I would only like to remark that usually people don't actually compute $p_{data}(y) = 1/N$. Rather, they sample $y \sim p_{data}$ which corresponds to selecting a random batch from your dataset. This suffices, for instance, for computing averages of functions over y. – Eduardo Montesuma Jun 10 '22 at 15:01