1

Convolutions can be expressed as a matrix-multiplication (see e.g. this post) and as an element-wise multiplication using the Fourier domain (https://en.wikipedia.org/wiki/Convolution_theorem).

Attention utilizes matrix multiplications, and is as such $O(n^2)$. So, my question is, is it possible to exploit the Fourier domain for attention mechanisms by turning the matrix multiplication of attention into a large convolution between the query and the key matrices?

nbro
  • 39,006
  • 12
  • 98
  • 176

0 Answers0