Implementing a GAN with control over the output class

Question

I am trying to accomplish the reverse of the typical MNIST in machine learning using a GAN - instead of predicting a number from an image of a digit, I want to reconstruct an image of a digit from a number. The traditional GAN, however, isn't designed for this use case, as it is designed to generate images similar to training data directly without being given an input. One way to work around this issue that I've thought of is to take a train feature digit, connect it to a densely-connected layer Dense(784), reshape it to (28 x 28 x 1), and then proceed with the generator as one usually does for a GAN. However, this seems like "fooling" the neural network to make up weights out of thin air, and I doubt this would work properly.

How can I modify a GAN so that it takes single-digit inputs without resorting to the aforementioned approach?

score 4 · Accepted Answer · answered Jul 31 '23 at 12:15

4

This is a common use case for GANs, you want the output to be conditioned on some controlled input, as opposed to just random seed data.

This Medium article cGAN: Conditional Generative Adversarial Network — How to Gain Control Over GAN Outputs walks through creating your example almost exactly (using MNIST digits and selecting which one you want).

The basic changes you need to make to turn your "freely choose class" GAN to a conditional GAN:

Add input to the generator, in addition to the random seed, that describes the class you want to generate. For a digit generator, that might be a one-hot array for the digit class you want it to make.
Add the same input to the discriminator, alongside the image input. That may involve having inputs at different layers, so you can combine CNN and fully-connect layers more easily. Typically you would concatenate the class choice to the flattened last CNN layer, and use this concatenated vector as input to the first fully-connected layer. But you could concat the class data to any nn layer before the output.
Train as before, whilst tracking which class is being faked (or real) and ensuring the generator and discriminator are fed the correct class details during training.

You could optionally provide incorrect labels for some real or fake images, and score appropriately, but that is not 100% necessary. The discriminator should detect that something declared to be a '1' looks more like a '2' and mark it as fake, without needing to be specifically trained for it.

answered Jul 31 '23 at 12:15

Neil Slater

28,678
3
38
60

1

maybe also GAN inversion could be used, but i agree with you that cGANS are actually the correct answer (+1) – Alberto Jul 31 '23 at 12:36
1

@AlbertoSinigaglia Yes GAN inversion can be a way to adapt an existing trained GAN, or extract more complex descriptive vectors. It's maybe worth an answer as an alternative approach. – Neil Slater Jul 31 '23 at 12:38
Yup you're right, done – Alberto Jul 31 '23 at 12:44
@NeilSlater If the problem is non-categorical (say, the labels are arbitrary numbers), would a cGAN still be effective at solving the problem? As it seems that cGANs are mostly optimized for categorical data? – JS4137 Aug 01 '23 at 04:43
1

@JS4137 a cGAN would not learn to generalise to a space like arbitrary numbers. It is not just the very large space, but also the very specific results expected for the smallest differences. Other continuous spaces might still be possible, e.g. what direction the portrait of a person was looking in, or the colour of a generated car's paint. The results might not be as precise as e.g. generating from a ray tracing system, but could be made acceptable using cGAN-like setup – Neil Slater Aug 01 '23 at 06:46
@NeilSlater Sorry to ask a follow-up question, but given a non-categorical but structured multi-input problem (such as given the labels of a car's color, year it was produced, dimensions, and other info about that car, asking the cGAN to produce an image of that car), would a cGAN still be workable? In my experimentation it has not worked so well. – JS4137 Aug 01 '23 at 09:38
1

@JS4137 Yes that's a bit much for cGAN. You are probably getting into something more like CLIP-guided generations - https://wandb.ai/geekyrakshit/stylegan-nada/reports/Digging-Into-StyleGAN-NADA-for-CLIP-Guided-Domain-Adaptation--VmlldzoyMjA5MDU1 - although that won't specifically follow all the technical details you are listing, there may be a similar architecture that could. – Neil Slater Aug 01 '23 at 16:38

score 3 · Answer 2 · answered Jul 31 '23 at 12:44

I agree with @Neil answer, as I also strongly believe that cGANs are the actual answer of your problem.

However, as he suggested, maybe it's worth mentioning that also GANs inversion can be used to do achieve such results, and it's used when training a new GAN from scratch it's too expensive.

Implementing a GAN with control over the output class

2 Answers2