1

Is attention useful only in transformer/convolution layers? Can I add it to linear layers? If yes, how (on a conceptual level, not necessarily the code to implement the layers)?

pentavol
  • 13
  • 4

0 Answers0