Convolutions can be expressed as a matrix-multiplication (see e.g. this post) and as an element-wise multiplication using the Fourier domain (https://en.wikipedia.org/wiki/Convolution_theorem).
Attention utilizes matrix multiplications, and is as such $O(n^2)$. So, my question is, is it possible to exploit the Fourier domain for attention mechanisms by turning the matrix multiplication of attention into a large convolution between the query and the key matrices?