How to evaluate the performance of an autoencoder trained on image data?

Question

I am training an autoencoder on (general) image data.

I use binary crossentropy loss function, but it is not very informative when I want to evaluate the performance of my autoencoder.

An obvious performance metric would be pixel-wise MSE, but it has its own downsides, shown on some toy examples in an image from paper from Pihlgren et al.

In the same paper, the authors suggest using perceptual loss, but it seems complicated and not well-studied.

I found some other instances of this question, but there doesn't seem to be a concensus.

I understand that it depends on the application, but I want to know if there are some general guidelines as to which performance metric to use when training autoencoders on image data.

score 1 · Answer 1 · answered Jun 24 '20 at 12:37

I will answer my own question to try and provide some insights.

My research supervisor suggested that I should use the SSIM metric or some other well-known image processing metric (see the book "Modern Image Quality Assessment" by Wang and Bovik) for assessing the visual similarity of an images.

Another way I evaluate the performance of an autoencoder is by simply visually comparing the input and output images taken from the test set. This is by no means very scientific, but it gives a good idea whether an autoencoder is able to reconstruct the input images. One thing I would add here is that even if an autoencoder can reconstruct images perfectly, it doesn't mean that the encoding it learned is useful. For example, when I wanted similar images to be mapped to similar encodings, the autoencoder that was able to do that better was outputting more blurred reconstructed images in comparison to the autoencoder that wasn't achieving this similarity preservation (but was outputting better reconstructions).

How to evaluate the performance of an autoencoder trained on image data?

1 Answers1