We've all read the attention is all you need paper, but is it really all you need? Can you effectively replace any RNN/CNN with an attention transformer and see better results?
Asked
Active
Viewed 1,199 times
0

Recessive
- 1,346
- 8
- 21
-
Note that some LSTM architectures (e.g. for machine translation) that were published before transformers (and its attention mechanism) already used some kind of attention mechanism. So, the idea of "attention" already existed before the transformers. So, I think you should edit your post to clarify that u're referring to the transformer rather than the general idea of attention, if that's really the case. Having said that, we have had a few similar questions in the past, like [this](https://ai.stackexchange.com/q/20075/2444), so you might want to look at them and then reviewing your question. – nbro Apr 14 '22 at 09:52