I know that several tokenization methods that are used for tranformer models like WordPiece for Bert and BPE for Roberta and others. What I was wondering if there is also a transformer which uses a method for tokenization similarly to the embeddings that are used in the fasttext library, so based on the summations of embeddings for the n-grams the words are made of.
To me it seems weird that this way of creating word(piece) embeddings that can function as the input of a transformer isn't used in these new transformer architectures. Is there a reason why this is not tried yet? Or is this question just an result of my inability to find the right papers/repo's.