Forward Diffusion Process Derivation In Diffusion Models

Question

In papers and other material regarding diffusion models the forward diffusion process is defined by adding a small amount of Gaussian noise to an image $x_0$ for $T$ time steps. In each time step the noise has a variance of $\beta_t$. This process produces a sequence of noisy samples: $x_1, x_2, x_3... x_T$ such that: $q(x_t|x_{t-1}) = N(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_tI)$

I don't understand why this is $q(x_t|x_{t-1})$ distribution. When adding a constant $c$ to a normal random variable with mean $\mu$ and variance $\sigma^2$ we get a new random variable with the same variance and a mean of $c+\mu$. Therefore, I expect $q(x_t|x_{t-1})$ to be: $q(x_t|x_{t-1}) = x_{t-1} + \epsilon_t = N(x_t; x_{t-1}, \beta_t I)$ where $\epsilon_t=N(\epsilon_t; 0, \beta_t I)$

Any help will be appreciated.

score 1 · Answer 1 · answered Jan 11 '23 at 02:00

1

The relationship between $x_t$ and $x_{t-1}$ is as follows: $$ x_t = \sqrt{1-\beta_t}x_{t-1}+\sqrt{\beta_t}\epsilon_t,\quad \epsilon_t\sim\mathcal{N}(0,I). $$ Not only is a small amount of noise added, the original image is also scaled down slightly.

answered Jan 11 '23 at 02:00

Patrick Johnstone

21
3

score 0 · Answer 2 · answered Dec 09 '22 at 19:40

0

The image data $\mathbf{x}_{t-1}$ is not a constant $\mathbf{c}$. It's itself a distribution. Different permutations of pixels have different probabilities.

answered Dec 09 '22 at 19:40

Eureka Zheng

101
2

Can you please expand this answer? How is the sqrt(1-beta_t) derived? – PascalIv Jan 04 '23 at 14:08

score 0 · Answer 3 · answered Apr 12 '23 at 14:55

The purpose of $q(x_t|x_{t-1})$ is: given that the random variable $x_{t-1}$ is sampled to be a specified value, what is the probability density function for $x_t$? So we're using the sampled value of $x_{t-1}$ to calculate the mean of the probability density function, which is $\sqrt{1-\beta_t} x_{t-1}$. We're not adding $x_{t-1}$ and $\epsilon_t$. Instead, we are steering the mean towards zero a little bit each time step (because $\sqrt{1-\beta_t}<1$), while adding a little noise.

To implement this, we execute the sampling of $x_t$ by taking the mean (the constant $\sqrt{1-\beta_t} x_{t-1}$) and adding a sample from the standard normal distribution scaled by $\sqrt{\beta_t}$.

Of course, the probability chain rule then provides us the desired (nonconditional) pdf for $x_T$ at the final time.

Forward Diffusion Process Derivation In Diffusion Models

3 Answers3