In the original CycleGAN paper, on the second page, there is a sentence that I didn't quite understand
In theory, this objective can induce an output distribution over $\hat{y}$ that matches the empirical distribution $p_{\text {data }}(y)$ (in general, this requires $G$ to be stochastic) [16].
What does $p_{\text {data }}(y)$ denote? Furthermore, I can't imagine the empirical distribution of it.
In the loss functions, there is also $x \sim p_{\text {data }}(x)$, but I also don't get the context there.
Could anyone please elaborate further and explain this sentence to me?