Questions tagged [variational-inference]

For questions related to variational inference (VI), an optimization-based approach to the inference problem (i.e. the computation of the posterior given the prior, likelihood, and marginal). VI is used, for example, in the context of auto-encoders (VAEs) and Bayesian neural networks (BNNs).

For more info, you could read the paper Variational Inference: A Review for Statisticians (2018) by David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe.

12 questions
5
votes
1 answer

What is the intuition behind variational inference for Bayesian neural networks?

I'm trying to understand the concept of Variational Inference for BNNs. My source is this work. The aim is to minimize the divergence between the approx. distribution and the true posterior $$\text{KL}(q_{\theta}(w)||p(w|D) = \int q_{\theta}(w) \…
3
votes
1 answer

What does the approximate posterior on latent variables, $q_\phi(z|x)$, tend to when optimising VAE's

The ELBO objective is described as follows $$ ELBO(\phi,\theta) = E_{q_\phi(z|x)}[log p_\theta (x|z)] - KL[q_\phi (z|x)||p(z)] $$ This form of ELBO includes a regularisation term in the form of the KL divergence which drives $q_\phi(z|x)…
2
votes
1 answer

Why don't we also need to approximate $p(x \mid z)$ in the VAE?

In the VAE, we approximate the probability distribution $p(z \mid x)$, where $z$ is the latent vector and $x$ is our data. The reason is that $p(z \mid x)$ becomes impossible to calculate for continuous data because of $p(x)$, which require…
2
votes
1 answer

How does the VAE learn a joint distribution?

I found the following paragraph from An Introduction to Variational Autoencoders sounds relevant, but I am not fully understanding it. A VAE learns stochastic mappings between an observed $\mathbf{x}$-space, whose empirical distribution…
1
vote
1 answer

If we know the joint distribution, can we simply derive the evidence from it?

I'm struggling to understand one specific part of the formalism of the free energy principle. My understanding is that the free energy principle can be derived from considering statistical dynamics of a system that is coupled with its environment in…
1
vote
2 answers

Why optimise log p(x) rather than log p(x|z) in a Variational AutoEncoder?

Background The loss function in a Variational AutoEncoder is the Evidence Lower Bound (ELBO): $\mathbb{E}_q[log$ $p(x|z)] - KL[q(z)||p(z)]$ And has this inequality: $log$ $p(x) \ge \mathbb{E}_q[log$ $p(x|z)] - KL[q(z)||p(z)]$ It is said in the…
1
vote
0 answers

Why do we use $q_{\phi}(z \mid x^{(i)})$ in the objective function of amortized variational inference, while sometimes we use $q(z)$?

In page 21 here, it states: General Idea of Amortization: if same inference problem needs to be solved many times, can we parameterize a neural network to solve it? Our case: for all $x^{(i)}$ we want to solve: $$ \min _{q(z)} \mathrm{KL}\left(q(z)…
1
vote
1 answer

Do we use two distinct layers to compute the mean and variance of a Gaussian encoder/decoder in the VAE?

I am looking at appendix C of the VAE paper: It says: C.1 Bernoulli MLP as decoder In this case let $p_{\boldsymbol{\theta}}(\mathbf{x} \mid \mathbf{z})$ be a multivariate Bernoulli whose probabilities are computed from $\mathrm{z}$ with a…
0
votes
0 answers

What is $p(Z)$ and what happens to the variational posterior $q(Z;X)$ during data synthesis (after training)?

From my understanding of inference problems, we want to compute the posterior $p(Z|X=D)$, for some observed dataset $D=(x^1, x^2,\dots,x^n)$ of $n$ independent observations, in order to "update" our prior $p(Z)$ for further analysis/data generation.…
0
votes
0 answers

what's the best way of Inferring probability chance of heads with a coin of Unknown Bias that changes regularly

hi what would be the best strategy to infer the range of probabilities of getting heads with a coin of Unknown Bias that is variable? I'm working on a similar problem with a game AI. I'm working on a AI to play a game that consists of multiple nodes…
yomama
  • 1
  • 1
0
votes
1 answer

Why isn't the evidence $p(x) = 1$ if it's an observed variable?

Every explanation of variational inference starts with the same basic premise: given an observed variable $x$, and a latent variable $z$, $$ p(z|x)=\frac{p(x,z)}{p(x)} $$ and then proceeds to expand $p(x)$ as an expectation over $z$: $$ p(x) =…
0
votes
0 answers

Tensorflow Probability Implementation of Automatic Differentiation Variational Inference with Mixtures

In this paper, the authors suggest using the following loss instead of the traditional ELBO in order to train what basically is a Variational Autoencoder with a Gaussian Mixture Model instead of a single, normal…