Why can't VAE do sequence to sequence name generation?

Question

I'm working on research in this sector where my supervisor wants to do cannonicalization of name data using VAEs, but I don't think it's possible to do, but I don't know explicitly how to show it mathematically. I just know empirically that VAEs don't do good on discrete distributions of latents and observed variables(Because in order to do names you need your latent to be the character at each index and it can be any ASCII char, which can only be represented as a distribution). So the setup I'm using is a VAE with 3 autoencoders, for latents, one for first, middle and last name and all of them sample each character of their respective names from the gumbel-softmax distribution(A form a categorical that is differentiable where the parameters is a categorical dist). From what I've seen in the original paper on the simple problem of MNIST digit image generation, the inference and generative network both did worse as latent dimension increased and as you can imagine the latent dimension of my problem is quite large. That's the only real argument for why this can't work, that I have. The other would have been it's on a discrete distribution, but I solved that by using a gumbel softmax dist instead.

This current setup isn't working at all, the name generations are total gibberish and it plateaus really early. Are there any mathematical intuitions or reasons that VAEs won't work on a problem like this?

As a note I've also tried semi-supervised VAEs and it didn't do much better. I even tried it for seq2seq of just first names given a first and it super failed as well and I'm talking like not even close to generation of names or the original input.

Im confused. For GANs you'd need Gumbell Sfotmax or REINFORCE or something similar, but for VAEs this can be done directly without approximation. I see your approach but have you tried just teacher forcing? — mshlis, Feb 13 '20 at 01:37
VAEs can't do this because they can't backprop on discrete distributions, this was stated in Bengio's paper. I don't know what teacher forcing is. — user8714896, Feb 13 '20 at 06:49
Interesting. I don't know how much that helps VAEs cause I don't think that helps it approximate the inference or generation. If anything a semi supervised VAE makes more sense — user8714896, Feb 13 '20 at 20:12
Have you tried this using a normal AE? How does exactly the same input and structure perform if it is just a standard AE? This might help troubleshoot potential logic errors that could be causing this behaviour. Sorry I can't help with a mathematical intuition, but it does seem odd to use something meant for continuous distributions for something discrete. — Recessive, Feb 16 '20 at 23:04
@Recessive true but there are discrete VAEs that handle this by using the discrete outputs as a parameter to a continuous dist like Gumbel softmax. A normal AE does pretty good — user8714896, Feb 16 '20 at 23:10
@user8714896 It's a long shot, but you *might* have some luck training an AE first, then applying gumbel softmax to the latent space of the AE to make it variational. Sort of what you're already doing (by the sounds of it) but in reverse. — Recessive, Feb 16 '20 at 23:20
@Recessive that sounds like a semi-supervised VAE. Well sort of, you're suggesting training it as a DAE then making it variational after, which is pretty interesting — user8714896, Feb 16 '20 at 23:24
@Recessive actually the more I think about it the more I like it — user8714896, Feb 16 '20 at 23:27
What's their terminal goal with a Name VAE? Is it just an academic challenge? If it has a real goal behind (like generating new names, or saving data with less storage) you could just suggest another plausible approach instead of trying to prove them wrong. — Andre Goulart, Aug 20 '21 at 21:43

Why can't VAE do sequence to sequence name generation?

0 Answers0