2

In order to check, whether the visitor of the page is a human, and not an AI many web pages and applications have a checking procedure, known as CAPTCHA. These tasks are intended to be simple for people, but unsolvable for machines.

However, often some text recognition challenges are difficult, like discerning badly, overlapping digits, or telling whether the bus is on the captcha.

As far as I understand, so far, robustness against adversarial attacks is an unsolved problem. Moreover, adversarial perturbations are rather generalizable and transferrable to various architectures (according to https://youtu.be/CIfsB_EYsVI?t=3226). This phenomenon is relevant not only to DNN but for simpler linear models.

With the current state of affairs, it seems to be a good idea, to make CAPTCHAs from these adversarial examples, and the classification problem would be simple for human, without the need to make several attempts to pass this test, but hard for AI.

There is some research in this field and proposed solutions, but they seem not to be very popular.

Are there some other problems with this approach, or the owners of the websites (applications) prefer not to rely on this approach?

2 Answers2

2

I think the problem is that this type of attack will only work for the model that was used to produce the perturbations. These perturbations are computed by backpropagating an error for an image of, say, a panda, but with the true label "airplane".

In other words, perturbations are nothing more than gradients indicating in which direction each pixel needs to be changed to make the panda look like an airplane for that particular model. Since the same model will have different weights after each training, this attack will only work for the model used to generate the gradients.

Here is an illustrative example of this idea when training a generator in a GAN:

enter image description here

Update

While we can transfer an adversarial attack from one model to another, this is only possible under strict constraints. To successfully generate perturbations for the target model, we first need to know the dataset that was used to train it. We also need to know the architecture including the activation and loss functions as well as the hyperparameters of this model. Here is a work in which the authors take a closer look at this topic.

Even though it is possible, in my opinion, using CAPTCHAs does not make sense as these attacks may not work in the real world. For example, if we apply this attack to a road sign to trick the autopilot in a vehicle, the lighting conditions and camera perspective can significantly affect the classification.

Aray Karjauv
  • 907
  • 8
  • 15
  • Thanks for a nice pic. However, if these perturbations worked only for a single model, and have no impact on others, this would make use of these captchas, meaningless. However, when training an ensemble of models, one can develop rather universal perturbations - table 4 from https://arxiv.org/pdf/1611.02770.pdf, that are capable of fooling some other model. This work is not very new, so it seems for me to be reasonable to combine EfficientNets, NFNets, ViTs, CaiTs and this would produce reliable and universal adversarial perturbations. – spiridon_the_sun_rotator Jul 04 '21 at 07:34
  • Thanks for pointing me to this work. But I still don't believe it will work in practice. The authors state that they are unaware of the dataset being used, but I suspect that the target platform (at least in part) used publicly available datasets like ImageNet as well as pre-trained models. I've updated my answer accordingly. Anyway, this topic is very interesting. – Aray Karjauv Jul 04 '21 at 11:20
0

Because the examples are fit to a particular ML model and if you train using different parameters they probably won't be valid.

FourierFlux
  • 783
  • 1
  • 4
  • 14