Is there a pretrained (NLP) transformer that uses subword n-gram embeddings for tokenization like fasttext?

Question

I know that several tokenization methods that are used for tranformer models like WordPiece for Bert and BPE for Roberta and others. What I was wondering if there is also a transformer which uses a method for tokenization similarly to the embeddings that are used in the fasttext library, so based on the summations of embeddings for the n-grams the words are made of.

To me it seems weird that this way of creating word(piece) embeddings that can function as the input of a transformer isn't used in these new transformer architectures. Is there a reason why this is not tried yet? Or is this question just an result of my inability to find the right papers/repo's.

score 2 · Answer 1 · answered Aug 13 '21 at 14:47

There is a pre-trained language model called ProphetNet for sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction.

https://github.com/microsoft/ProphetNet

Also, there are few variants on hugging face website as well https://huggingface.co/models?search=ProphetNet

Is there a pretrained (NLP) transformer that uses subword n-gram embeddings for tokenization like fasttext?

1 Answers1