1

Suppose there is a dataset $D$ of images. We have enough number $n$ of images in the dataset and all the images are of a single class.

Suppose I generated a new image $I$, which is not present in the given dataset, of the same class using a generator neural network. I want to calculate how natural the image $I$ is wrt the dataset $D$

$m(I, D) = $ how natural the image $I$ with respect to dataset $D$ of images.

I don't want metrics that are applied to a bunch of generated images. I have only one generated image.


I came up with a naive metric

$m(I, D) = \sum\limits_{x \in D} (x-I)^2 $

where $x-I$, difference between two images, is defined as the sum of pixel differences of both the images i.e., $$x-I = \sum\limits_{x_i \in x, I_i \in I} \|x_i - I_i\|$$

But, this measure shows how similar the new image $I$ w.r.t is to the set of images in my dataset at the pixel level. I want a measure of how natural it is.

hanugm
  • 3,571
  • 3
  • 18
  • 50

1 Answers1

1

Evaluating synthetically generated images is challenging and an active area of research. The problem is that the "how natural is an image"-task is not well-defined and subjective.

To evaluate generated images we can define two abstract properties: fidelity and diversity, as we want to generate not only a single high-quality image, but also different ones from the domain.

There are several methods for automating and standardizing the evaluation of generated samples, such as Inception Score (IS) and Fréchet Inception Distance (FID). Both approaches utilize a CNN classification model (typically Inception-v3), that is pretrained on the entire dataset.

We can then use this pretrained model to classify generated images and calculate the distribution of predicted classes, which should be uniformly distributed for high diversity, and the distribution of predictions on a single class will represent the fidelity. However, this approach does not capture how synthetic images compare to real images.

Instead of comparing images pixel-wise, we can compare their abstract features. CNNs are known to be good at extracting abstract features, so we can use a pretrained CNN for extracting a feature embedding from one of the last hidden layers. After this, we can compare the Euclidean or cosine distance between various embeddings, for instance. The better way to compare the similarity between generated and real images is FID. Here is an article on the topic for more details.

Aray Karjauv
  • 907
  • 8
  • 15