EM(Expectation Maximum)
Target: maximize $p_\theta(x)$
$ p_\theta(x)=\frac{p_\theta(x, z)}{p_\theta(z \mid x)} \\\\$
Take log on both sides:
$ \log p_\theta(x)=\log p_\theta(x, z)-\log p_\theta(z \mid x) \\\\$
Introduce distribution $q_\phi(z)$:
$ \log p_\theta(x)=\log \frac{p_\theta(x, z)}{q_\phi(z)}-\log \frac{p_\theta(z \mid x)}{q_\phi(x)} \\ \int_z q_\phi(x) \log p_\theta(x) d z=\int_z q_\phi(z) \log \frac{p_\theta(x, z}{q_\phi(z)} d z-\int_z q_\phi(z) \log \frac{p_\theta(z \mid x)}{q_\phi(x)} d z \\ \log p_\theta(x)=\underbrace{\int q_\phi(z) \log \frac{p_\theta(x, z)}{q_\phi(z)} d z}_{E L B O}+\underbrace{\int_z q_\phi(z) \log \frac{q_\phi(z)}{p_\theta(z \mid x)} d z}_{K L\left(q_\phi(z) \| p_\theta(z \mid x)\right)} \\ $
Our target is to maximize $log\ p_{\theta}(x)$, and the idea of EM is to let KL=0 $ \Rightarrow $ $log\ p_{\theta}(x)=ELBO$, and then maximize $log\ p_{\theta}(x)$ by maximizing ELBO.
- Do an algebraic variation on ELBO:
$ELBO=\int q_\phi(z) \log \frac{p_\theta(x, z)}{q_\phi(z)} d z=\int q_\phi(z) \log p_\theta(x, z)d z-\int q_\phi(z) \log {q_\phi(z)} d z$
${q_\phi(z)}$ is calculated in E-step (${q_\phi(z)}=p_{\theta^{(t)}}(z|x)$) , so when we are maximizing $ELBO$, we can treat$ \int q_\phi(z) \log {q_\phi(z)} d z$ as a constant.
So in order to maximize $ELBO$, we just need to maximize $\int q_\phi(z) \log p_\theta(x, z)d z=E_{q_\phi(z)}[\log p_\theta(x, z)]$
VAE(Variational autoEncoder)
The above EM algorithm has a important premise that $p_{\theta^{(t)}}(z|x)$ is tractable, so we can just make ${q_\phi(z)}=p_{\theta^{(t)}}(z|x)$ to make $KL$ = 0. But if $p_{\theta^{(t)}}(z|x)$ is intractable, we have to use Variational Inference to make ${q_\phi(z)}$ approximate $p_{\theta^{(t)}}(z|x)$.
VAE minimizes ${K L\left(q_\phi(z) \| p_\theta(z \mid x)\right)} $ by maximizing $ELBO$. Do an algebraic variation on ELBO:
$-ELBO=\int q_\phi(z) \log p_\theta(z|x)d z+\underbrace{\int q_\phi(z) \log \frac{q_\phi(z)}{p_\theta(z)} d z}_{K L\left(q_\phi(z) \| p_\theta(z)\right)}$
Then we can use the gradient descent method for optimization.
My question is: The target of VAE is to maximise $ELBO$ and thus minimise ${K L\left(q_\phi(z) \| p_\theta(z \mid x)\right)} $, which is the same as the E-step of the EM algorithm. However, the sum of ELBO and KL remains the same, and it can not to increase $ logp(x).$ So VAE is just the E-step in the EM algorithm? How does it manage to fit the distribution of $x$ (increase $ logp(x)$)?