I am trying to perform a white-box attack on a model.
Would it be possible to simply use the numerical gradient of the output wrt input directly rather than computing each subgradient of the network analytically? Would this (1) work and (2) actually be a white box attack?
As I would not be using a different model to 'mimic' the results but instead be using the same model to get the outputs, am I right in thinking that this would still be a white box attack.