I suppose that picking an appropriate size for the bottleneck in Autoencoders is neither a trivial nor an intuitive task. After watching this video about VAEs, I've been wondering: Do disentangled VAEs solve this problem?
After all, if the network is trained to use as few latent space variables as possible, I might as well make the bottleneck large enough so that I don't run into any issues during training. Am I wrong somewhere?