I've been training a VAE to reconstruct human names and when I train it on a batch size of 100+ after about 5 hours of training it tends to just output the same thing regardless of the input and I'm using teacher forcing as well. When I use a lower batch size for example 1 it super overfitted and a batch size of 16 tended to give a much better generalization. Is there something about VAEs that would make this happen? Or is it just my specific problem?
Asked
Active
Viewed 1,399 times
1 Answers
0
My response is based on my limited experience with VAE:
Given these networks generate random samples (z) conditioned to x, and then the decoder output (D(z)) is compared with x (||x-D(z)||), if xs in the batch does not replicate some randomness, the network will not be trained properly. In other word, there should not be a direct correspondence between x and D(z) and instead the encoder should present a probability distribution function. I hope it makes sense.

Say
- 1