I am training an autoencoder on (general) image data.
I use binary crossentropy loss function, but it is not very informative when I want to evaluate the performance of my autoencoder.
An obvious performance metric would be pixel-wise MSE, but it has its own downsides, shown on some toy examples in an image from paper from Pihlgren et al.
In the same paper, the authors suggest using perceptual loss, but it seems complicated and not well-studied.
I found some other instances of this question, but there doesn't seem to be a concensus.
I understand that it depends on the application, but I want to know if there are some general guidelines as to which performance metric to use when training autoencoders on image data.