4

I've been training a VAE to reconstruct human names and when I train it on a batch size of 100+ after about 5 hours of training it tends to just output the same thing regardless of the input and I'm using teacher forcing as well. When I use a lower batch size for example 1 it super overfitted and a batch size of 16 tended to give a much better generalization. Is there something about VAEs that would make this happen? Or is it just my specific problem?

user8714896
  • 717
  • 1
  • 4
  • 21

1 Answers1

0

My response is based on my limited experience with VAE:

Given these networks generate random samples (z) conditioned to x, and then the decoder output (D(z)) is compared with x (||x-D(z)||), if xs in the batch does not replicate some randomness, the network will not be trained properly. In other word, there should not be a direct correspondence between x and D(z) and instead the encoder should present a probability distribution function. I hope it makes sense.

Say
  • 1