Are there any benefits of adding attention to linear layers?

Asked Sep 02 '21 at 01:32

Active Sep 03 '21 at 11:58

Viewed 98 times

Is attention useful only in transformer/convolution layers? Can I add it to linear layers? If yes, how (on a conceptual level, not necessarily the code to implement the layers)?

edited Sep 03 '21 at 11:58

asked Sep 02 '21 at 01:32

pentavol

Are there any benefits of adding attention to linear layers?

0 Answers0