0

I have a transform-encoder only architecture, which has the following structure:

      Input
        |
        v
     ResNet(-50)
        |
        v
fully-connected (on embedding dimension)
        |
        v
positional-encoding
        |
        v
transformer encoder
        |
        v
Linear layer to alphabet.

I am trying to visualize the self-attention of the encoder layer to check how each input of the attention attends other inputs. (E.g. https://github.com/jessevig/bertviz)

Where I encounter difficulty is in how I can visualize these activations in terms of the original input of the ResNet and not its output, in order to make my model visually interpretable.

Do you have any ideas or suggestions?

nbro
  • 39,006
  • 12
  • 98
  • 176
John Sig
  • 101
  • 1

0 Answers0