Why do adversarial attacks work on CNNs if they classify images as humans do?

Question

A common illustration on how CNN works is as follows: https://www.researchgate.net/figure/Learned-features-from-a-Convolutional-Neural-Network_fig1_319253577. It seems to suggest that CNN in particular classifies images in a similar manner as human does (i.e., based on visual features).

So why do adversarial attacks, such as FGSM ,still work on CNNs? If the perturbation is strong enough for CNN to pick up, shouldn't human also be able to tell the difference?

I don't understand this question "If the perturbation is strong enough for CNN to pick up, shouldn't human also be able to tell the difference?"? Are you interested in knowing if humans are also fooled like CNNs are fooled? — nbro, Jan 14 '23 at 22:06
A common assumption is that a successful adversarial example should only fool the model, because otherwise it defeats its purpose — Sam, Jan 15 '23 at 09:52
I am not sure this assumption is required, but maybe that's what people assume. It seems that you have 2 distinct questions: 1. why can we fool neural networks given that they seem to classify images in the same way as we do and we are not fooled, 2. if humans classify images as neural networks do, then why aren't we also fooled? — nbro, Jan 15 '23 at 09:57
yes, but I'm discussing CNN type model in particular. As you can see from the reference I provided, CNN extracts hierarchical visual features at different layers and it extracts the patterns similarly to how human perception works, especially at the deeper layers (at least from what I understood) — Sam, Jan 15 '23 at 10:14

Why do adversarial attacks work on CNNs if they classify images as humans do?

0 Answers0