A common illustration on how CNN works is as follows: https://www.researchgate.net/figure/Learned-features-from-a-Convolutional-Neural-Network_fig1_319253577. It seems to suggest that CNN in particular classifies images in a similar manner as human does (i.e., based on visual features).
So why do adversarial attacks, such as FGSM ,still work on CNNs? If the perturbation is strong enough for CNN to pick up, shouldn't human also be able to tell the difference?