I have to write the formalization of the loss function of my network, built following the WGAN-GP model. The discriminator takes 3 consecutive images as input (such as 3 consecutive frames of a video) and must evaluate if the intermediate image is a possible image between the first and the third.
I thought something like this, but is it correct to identify x1, x2 and x3 coming from Pr even if they are 3 consecutive images? Only the first is chosen randomly, the others are simply the next two.
EDIT:
EDIT 2:
I replaced Pr with p_r(x1, x3) and p_r(x1, x2, x3) to reinforce the fact that x2 and x3 are taken after x1, so they depend on the choice of x1. Is it more correct this way?