Transformers work with lists of vectors, i.e. sentence of length SEQ_LEN, with each word having size EMBEDDING_DIM. Now, since the model still makes use of Dense layers internally, i.e. as in https://www.tensorflow.org/text/tutorials/transformer, I'm having trouble understanding how this 2D input is passed through the Dense layer, as 2D data is usually flattened before entering a Dense layer, i.e. in the case of an image?
Actually, in the general case - let's say I have sentences, with an embedding vector for each word in each sentence, how would I pass this into any layer, whether it be Dense / RNN / LSTM / etc ?