0

Every explanation of variational inference starts with the same basic premise: given an observed variable $x$, and a latent variable $z$,

$$ p(z|x)=\frac{p(x,z)}{p(x)} $$

and then proceeds to expand $p(x)$ as an expectation over $z$:

$$ p(x) = \int{p(x,z)dz} $$

and then states that it's too difficult to evaluate.

My very very basic question is why is $p(x)$ not simply equal to 1? It's an observed variable!

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

0

You're forgetting that $x$ can assume several values with different probability. Let's say that $x$ represent the roll of a fair dice. Then $p(x)$ will be 1/6 for all six possible values of $x$.

$$ p(\theta|x)=\frac{p(x|\theta)p(\theta)}{p(x)} $$

If you rearrange the formula it becomes clear that the whole point of Bayes theorem is to say that we want matching prior $p(\theta)$ and posterior $p(\theta|x)$ distributions.

$$ \frac{p(\theta|x)}{p(\theta)}=1=\frac{p(x|\theta)}{p(x)} $$

cause if the left side holds, then it means that the likelihood predicted by our parameters $p(x|\theta)$ is equal to the real observed probability $p(x)$. So if we train a model on some observed dice rolls, we expect a perfect model to learn for each face of the dice the real probability $p(x)$ which is 1/6 and not 1.

Edoardo Guerriero
  • 5,153
  • 1
  • 11
  • 25
  • Thank you for the explanation. I guess the short answer is that contrary to the explanations I've read, here $x$ is NOT assumed to be observed, but is treated as a random variable. (btw. in your second formula, surely you didn't mean that the ratios are equal to 1 - it's only when $\theta$ and $x$ are independent). – Abrrval Aug 04 '22 at 17:22
  • 1
    I guess it is semantics on what we mean by 'observed'. Say we rolled a dice and obtained the number 6, then we have simulated a realisation from our random variable $X$ and _observed_ the outcome of this one particular roll of the die. In the literature, when they say $x$ is observed, they typically mean that the random variable $X$ relates to something that we can observe (as opposed to the latent variable, which, by definition, is unobservable). So, whilst we have _observed_ that the outcome of this one roll of the die was 6, it does not mean it will always be the case ... – David Sep 03 '22 at 21:27
  • 1
    ... and so we use the probability of the outcome of that roll when doing variational inference. – David Sep 03 '22 at 21:28