0

For a typical adversarial attack, a sample $x_{0}$ is chosen from a training set belonging to class $C_{i}$ and a transformation $A$ is applied such that the adversarial example $x=A(x_{0})$ would be misclassified as $x \in C_{target}$. The adversarial example $x$ could be found iteratively through ${x}={x}_{0}+\eta \cdot \operatorname{sign}\left(\nabla_{{x}} J\left({x}_{0} ; {w}\right)\right)$ for instance.

In this documentation of TextAttack I read

Out of those 157, 29 attacks failed, leading to a success rate of 128/157 or 81.5%.

My question is: how do we chose the starting samples $x_{0}$ (here the 157) to evaluate the success rate of a type of adversarial attack? Do we chose several target classes? Several initial classes $C_{i}$? How many samples do we chose? From which set (training? validation?)

In the DeepFool paper, they'd use the entire test set- but is it standard practice to claim the success rate of a type of attack this way?

  • See: https://ai.stackexchange.com/a/14264/17742 and https://ai.stackexchange.com/a/9767/17742 - the success rate is supposed to be 100%; otherwise it's not an attack, only an attempt. – Rob Aug 08 '22 at 09:13
  • 1
    The adversarial example is supposed to have a 1/1 success rate, but I can't find a consistent way to chose the samples to evaluate the type of attack throughout the literature – VirginieDlpts Aug 08 '22 at 09:22

0 Answers0