I'm trying to train a VAE using a graph dataset. However, my latent space shrinks epoch by epoch. Meanwhile, my ELBO plot comes to a steady state after a few epochs.
I tried to play around with parameters and I realized, by increasing the batch size or training data, this happens faster, and ELBO comes to a steady state even faster.
Is this a common problem, with a general solution?
With these signs, which part of the algorithm is more possible to cause the issue? Is it an issue from computing loss function? Does it look like the decoder is not trained well? Or it is more likely for the encoder not to have detected features that are informative enough?
Edit:
I figured out that the problem is probably caused by the loss function. My loss function is a combination of the KL term and reconstruction loss. In the github page for graph auto-encoders, it is suggested that the loss function should include normalization factors according to the number of nodes in the graph. I haven't figured it out exactly, but by adding a factor of 100 to my reconstruction loss and a factor of 0.5 to my KL loss, the algorithm is working fine. I would appreciate it if someone can expand on how this exactly is supposed to be set up.