Can CNNs be made robust to tricks where small changes cause misclassification?

Question

I while ago I read that you can make subtle changes to an image that will ensure a good CNN will horribly misclassify the image. I believe the changes must exploit details of the CNN that will be used for classification. So we can trick a good CNN into classifying an image as a picture of a bicycle when any human would say it's an image of a dog. What do we call that technique, and is there an effort to make image classifiers robust against this trick?

This seems to be at least a partial duplicate of this: https://ai.stackexchange.com/q/15820/2444 (how can we generate inputs to an AI to fool them? is there any research on this?) or this https://ai.stackexchange.com/q/6800/2444 (in what ways could you fool an "AI", so not just a neural network, so this question seems to be too broad), or this https://ai.stackexchange.com/q/92/2444 (how is that possible that you can fool a NN), or this https://ai.stackexchange.com/q/6892/2444 (what tools are used to deal with AML). — nbro, Feb 18 '21 at 19:43
However, currently, none of these seem to be exact duplicates, so I will leave this open for a while, because your question focuses on **CNNs**, and I hope that the answers below understand that. — nbro, Feb 18 '21 at 20:07

score 5 · Accepted Answer · answered Feb 18 '21 at 19:28

These are known as adversarial attacks, and the specific examples that are misclassified are known as adversarial examples.

There is a reasonably large body of work on finding adversarial examples, and on making CNNs more robust (i.e. less prone to these attacks). An example is the DeepFool algorithm, which can be used to find perturbations of data which would cause the label to change.

There are several techniques in the literature which are used to fight against adversarial attacks. Here is a selection:

Augmenting the training data with various random perturbations. This is intended to make the model more robust to the typical adversarial attack where random noise is added to an image. An example of this approach is discussed in [1].
Constructing some sort of model to "denoise" input before feeding into the CNN. An example of this is Defense-GAN [2], which uses a generative adversarial model which models the "true" image distribution and finds an approximation of the input closer to the real distribution.

References

[1] Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy. Explaining and Harnessing Adversarial Examples. ICLR (2015). URL.

[2] Pouya Samangouei, Maya Kabkab, Rama Chellappa. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models. ICLR (2018). URL.

Can CNNs be made robust to tricks where small changes cause misclassification?

1 Answers1

References

Linked