Samples from a reverse diffusion process with cosine noise schedule blow up

Question

I have implemented a diffusion probabilistic model, and I am finding some of the model behavior unexpected.

When I draw samples from an untrained reverse diffusion process with 20 denoising steps using a cosine noise schedule, I find that the elements of my final denoised samples have a magnitude of around 400 (this number increases into the 1000s if I reverse diffuse for more steps). Is it normal for the magnitude of the final denoised sample to scale with the number of diffusion steps in this way?

I have given a MWE below where I have replaced the predicted noise (provided by a neural network in a denoising diffusion probabilistic model setting) with actual noise (I don't think this matters, as you can set the predicted noise to 0 and the same effect is still observed).

def cosine_beta_schedule(T, s = 0.008):
    """
    cosine schedule as proposed in https://proceedings.mlr.press/v139/nichol21a/nichol21a.pdf
    """
    t = np.linspace(0, T, T + 1)
    alphas_cumprod = np.cos((t / T + s) / (1 + s) * np.pi / 2) ** 2
    alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
    betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
    betas_clipped = np.clip(betas, a_min = 0, a_max = 0.999)
    
    return betas_clipped

import numpy as np

T = 20 # number of diffusion steps
betas = cosine_beta_schedule(T)
alphas = 1. - betas
alphas_cumprod = np.cumprod(alphas)
alphas_cumprod_prev = np.hstack([1., alphas_cumprod[:-1]])
sqrt_one_minus_alphas_cumprod = np.sqrt(1. - alphas_cumprod)
posterior_variance = np.clip(betas * (1. - alphas_cumprod_prev) / (1. - alphas_cumprod), a_min = 1e-20, a_max = 1)

# reverse process sampling loop (Algorithm 2 of https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf)
x_t = np.random.randn(1) 
for t in np.arange(T)[::-1]:
        noise_pred = np.random.randn(1) # THIS IS A SURROGATE FOR A NEURAL NETWORK THAT PREDICTS THE NOISE ADDED TO DATA
        posterior_mean = 1 / np.sqrt(alphas[t]) * (x_t - betas[t] / sqrt_one_minus_alphas_cumprod[t] * noise_pred)
        posterior_variance = betas[t] * (1. - alphas_cumprod_prev[t]) / (1. - alphas_cumprod[t])
        x_t = posterior_mean + np.sqrt(posterior_variance) * np.random.randn(1)

Samples from a reverse diffusion process with cosine noise schedule blow up

0 Answers0