2

I am trying to perform a white-box attack on a model.

Would it be possible to simply use the numerical gradient of the output wrt input directly rather than computing each subgradient of the network analytically? Would this (1) work and (2) actually be a white box attack?

As I would not be using a different model to 'mimic' the results but instead be using the same model to get the outputs, am I right in thinking that this would still be a white box attack.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • 1
    What exactly do you mean by `numerical gradient of the ouput`? If you mean using finite differences, this is generally considered a black-box attack. Usually the 'white box attack' setting is assuming that the adversary has full access to the model. The 'black box attack setting' generally assumes either that you only have access to outputs, either in the form of logits or (even more restrictive) just the class prediction (i.e. argmax of logits) – Chris Cundy Apr 12 '20 at 03:27

0 Answers0