I'm working on understanding VAEs, mostly through video lectures of Stanford cs231n, in particular lecture 13 tackles on this topic and I think I have a good theoretical grasp.
However, when looking at actual code of implementations, such as this code from this blog of VAEs I see some differences which I can't quite understand.
Please take a look at this VAE architecture visualization from the class, specifically the decoder part. From the way it is presented here I understand that the decoder network outputs mean and covariance for the data distribution. To get an actual output (i.e. image) we need to sample from the distribution that is parametrized by mean and covariance - the outputs of the decoder.
Now if you look at the code from the Keras blog VAE implementation, you will see that there is no such thing. A decoder takes in a sample from latent space and directly maps its input (sampled z) to an output (e.g. image), not to parameters of a distribution from which an output is to be sampled.
Am I missing something or does this implementation not correspond to the one presented in the lecture? I've been trying to make sense of it for quite some time now but still can't seem to understand the discrepancy.