1

Background

Generative modeling
Generative modeling aims to model the probability of observing an observation x. $$ p(x) = \frac{p(y\cap x)}{p(y|x)} $$

Representation Learning

Instead of trying to model the high-dimensional sample space directly, we describe each observation in the training set using some lower-dimensional latent space and then learn a mapping function that can take a point in the latent space and map it to a point in the original domain


Context:

Let’s suppose we have a training set consisting of grayscale images of biscuit tins enter image description here

we can convert each image of a tin to a point in a latent space of just two dimensions. We would first need to establish two latent space dimensions that best describe this dataset(height, width), then learn the mapping function f that can take a point in this space and map it to a grayscale biscuit tin image. enter image description here


Question:

What is:

P(y|x): is it possibility of point lying in Latent space given it is a point in (high dimensional) pixel space?

Or for that matter, what do x, y, P(x), P(y), P(x|y) really mean/represent here?

P.S.: this is referred from the book Generative Deep Learning 2nd edition By David Foster, O'Reilly Publication.

Prakhar
  • 11
  • 3

1 Answers1

0

"What is: x, y, P(x), P(y), P(x|y), P(y|x) in this example?" I don't really see where you are referring to them in the question, so I'll answer with the general interpretation in ML:

  • $P(x)$: also known as data distribution, it's the density that describes some data... take your example, if your bins are described by height and width, you can assume that maybe the height is described by $H\sim U(1,10)$ (uniform) and the width by $W \sim U(1,5)$, and supposing that they are independent, then the joint distribution of $H$ and $W$ is $P(x)$ (now, if you are dealing with images, you have to consider the joint distribution of all pixels, so it's a very high dimensional distribution)
  • $P(y)$ also known as target distribution. This can also be considered as "prior", since in discriminative task, you try to model $P(y|x)$... it describes your belief of the target distribution given no additional information... take for example a task where you try to regress the weight of a person, then you know that generally speaking, weight is distributed as a Gaussian (look up statistics online about this and you will see that Bell curve) centered at some weight
  • $P(y|x)$, your conditional probability that you want to learn with discriminative tasks. It means _"Given these informations $x$, how likely it $y$ to be some value"... take the weight regression task, say generally that the weight is distributed as $N(50,10)$, so given no additional information, the best guess is to say that a person will weight 50kg... however, if I additionally tell you that that persone is 190cm tall, then you know that that person will weight more than the average most likely, so your belief on his weight might shift to $N(70,5)$... this "shift" is the information brought by $x$, thus $p(y|x)$ is just a classification/regression etc etc
  • $P(x|y)$ is the other way around.. if we consider images, $Y$ might be for example a the digit you want to see in the image, and $x$ the image you want to generate, then $p(y|x)$ is just "which images are more likely to be correct given that I want to see the digit $y$ printed on it"... you can consider that $p(x)$ is already a distribution and $p(x|y)$ just takes the interesting part of it where there is $y$ printed on the image

so, in your example, P(x) is just the distribution of the images, P(y) is the distribution of some classification with those bins (say if a cookie will fit in it, it's your general belief wether a cookie fits in a random bin or not), P(y|x) is the "given this image of a bin will a cookie fit in that", and P(x|y) is "generate a picture of a possible bin such that this y cookie fits in"

Alberto
  • 591
  • 2
  • 10
  • Edited/rectified the question to better reflect the intent. Thanks for taking time to explain. Could you please update the answer accordingly. I wish to understand what the probability terms represent in the tin bucket example. – Prakhar Aug 20 '23 at 10:36
  • 1
    @Prakhar I still don't quite get what you are referring to with those letters, however I have a sense that you are probably referring to Variational Autoencoder, where y would be z/h depending on the book you are reading? – Alberto Aug 20 '23 at 12:11