How is the variance for a diffusion kernel derived for a diffusion model?

Question

So I'm watching this video tutorial from CVPR this year on diffusion models, and I am confused by the variance term in the distribution on the left on the video. I understand that in the forward process, we can track intermediate distributions

$$q(\mathbf{x}_t|\mathbf{x}_{t-1}) = \mathcal{N}(\mathbf{x}_t;\sqrt{1-\beta_t}\mathbf{x}_{t-1},\beta_t\mathbf{I})$$

And that the joint distributions of all the intermediate steps conditioned on the input is given by

$$q(\mathbf{x}_{1:T}|\mathbf{x}_{0}) = \prod_{t=1}^Tq(\mathbf{x}_t|\mathbf{x}_{t-1}).$$

If we define $\bar{\alpha} = \prod_{s = 1}^t(1 - \beta_t),$ then we are supposed to derive the diffusion kernel

$$q(\mathbf{x}_t|\mathbf{x}_{0}) = \mathcal{N}(\mathbf{x}_t;\sqrt{\bar{\alpha}}\mathbf{x}_0,(1 - \bar{\alpha}_t)\mathbf{I})$$

I can definitely see why the mean is what it is, but I'm having a hard time seeing where we get the variance from. How is the variance derived?

While I think this question fits here, you might have better chances of an answer in stats.stackexchange.com — Dr. Snoopy, Nov 13 '22 at 15:53
https://math.stackexchange.com/a/4476159/574215 This answer exactly derives both the mean and variance. — Eureka Zheng, Nov 19 '22 at 20:12

score 0 · Answer 1 · answered Apr 14 '23 at 13:04

Using the reparameterization trick that tells us: $$\begin{aligned} \mathbf{z} &\sim \mathcal{N}(\mathbf{z}; \boldsymbol{\mu}, \boldsymbol{\sigma}\boldsymbol{I}) \\ \mathbf{z} &= \boldsymbol{\mu} + \boldsymbol{\sigma} \odot \boldsymbol{\epsilon} \text{, where } \boldsymbol{\epsilon} \sim \mathcal{N}(0, \boldsymbol{I}) \end{aligned}$$ If we define $\alpha_t = 1 - \beta_t$ and $\bar{\alpha}_t = \prod_{i=1}^t \alpha_i$: $$ \begin{aligned} \mathbf{x}_t &= \sqrt{\alpha_t}\mathbf{x}_{t-1} + \sqrt{1 - \alpha_t}\boldsymbol{\epsilon}_{t-1} \\ &= \sqrt{\alpha_t}(\sqrt{\alpha_{t-1}} \mathbf{x}_{t-2} + \sqrt{1 - \alpha_{t-1}}\boldsymbol{\epsilon}_{t-2}) + \sqrt{1 - \alpha_t} \boldsymbol{\epsilon}_{t-1} \\ &= \sqrt{\alpha_t \alpha_{t-1}}\mathbf{x}_{t-2} + \sqrt{\alpha_t(1 - \alpha_{t-1})}\boldsymbol{\epsilon}_{t-2} + \sqrt{1 - \alpha_{t}}\boldsymbol{\epsilon}_{t-1} \\ &= \sqrt{\alpha_t \alpha_{t-1}}\mathbf{x}_{t-2} + \sqrt{\alpha_t(1 - \alpha_{t-1})+(1 - \alpha_{t})}\bar{\boldsymbol{\epsilon}}_{t-2} \\ &= \sqrt{\alpha_t \alpha_{t-1}}\mathbf{x}_{t-2} + \sqrt{1-\alpha_t \alpha_{t-1}}\bar{\boldsymbol{\epsilon}}_{t-2} \\ &= \dots \\ &= \sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t}\boldsymbol{\epsilon} \\ q(\mathbf{x}_t \vert \mathbf{x}_0) &= \mathcal{N}(\mathbf{x}_t; \sqrt{\bar{\alpha}_t} \mathbf{x}_0, (1 - \bar{\alpha}_t)\mathbf{I}) \end{aligned} $$

How is the variance for a diffusion kernel derived for a diffusion model?

1 Answers1