I would like to train a diffusion model with an additional loss on the created image. Without getting into too much details my intention is to do something like regularization, for example you may think that I want to make sure the created image is smooth, or something of the sort. My thinking was to add an additional loss during training, if the vanilla training process is: $$L = ||\epsilon-\epsilon_\theta(x_t, t)||$$ where $\epsilon_\theta$ is the model learning to predict the noise $\epsilon$ added to the original image. My suggested loss is: $$L = ||\epsilon-\epsilon_\theta(x_t, t)|| + \lambda * L'(img_\theta)$$ where $L'$ is my additional loss (which may for example induce smoothness or whatever), $img_\theta$ is the image that we get after denoising $x_t$ using the predicted noise. For standart models it is trivial that this makes sense. Due to the iterative nature of diffusion models I'm not sure if specifically for them it makes sense. I wasn't able to find any work that does something like this, would appreciate any help, does my additional loss makes sense to add?
Asked
Active
Viewed 109 times
1 Answers
0
I don't see why it should not work.
Though, the tough part is how to implement this. The algorithm requires to sample a singular time point along the timestep sequence, draw $\epsilon$, calculate $x_t$ for a batch, and evaluate the loss.
In your case you should also create the final image for the whole batch at each individual evaluation of the loss, which is extremely time consuming. But, other than that, I think it should work.

Peblo
- 11
- 3
-
It's time consuming because I'm gonna have to do the whole denoising process? So, for example, it may include 50 denoising steps instead of the single step that you usually do during training, that is what you mean? Perhaps it would make sense to denoise using a single step only? then it would spare the full denoising process. Anyhow, I intend to work with low resolution (MNIST) images, so I'm not that worried about efficiency – Gilad Deutsch Jul 05 '23 at 18:33
-
@GiladDeutsch yes this is what I meant with "time consuming". You may try to partially mitigate the problem by skipping a few denoising steps (look up the reply by Ciodar to [this question](https://ai.stackexchange.com/questions/41068/about-cosine-noise-schedule-in-diffusion-model/41144#41144)), or by adding the penalization term at random only every _N_ batches. Keep me posted, I am curious to know about the solution you achieve. – Peblo Jul 06 '23 at 11:46
-
are you sure that the fact that the diffusion model is recursive (i.e iteratively denoises the image) does not create an issue with this idea of learning using a loss on the created image? seems like this would create the same problems that RNNs have – Gilad Deutsch Jul 27 '23 at 09:11
-
I am not sure which RNN issue you refer to, however -- regarding the recursiveness: keep in mind that the _reproduction_, i.e., the whole inverse process at prediction time, _is_ indeed recursive, but the learning phase is _not_. During the learning phase the NN only needs to know the timestep $t$ under current evaluation, and upgrade the weights, it does _not_ need to recur. Look at Algorithm 1 in Ho et al. 2020, step 5. – Peblo Jul 31 '23 at 12:51