BERT encodes a piece of text such that each token (usually words) in the input text map to a vector in the encoding of the text. However, this makes the length of the encoding vary as a function of the input length of the text, which makes it more cumbersome to use as input to downstream neural networks that take only fixed-size inputs.
Are there any transformer-based neural network architectures that can encode a piece of text into a fixed-size feature vector more suitable for downstream tasks?
Edit: To illustrate my question, I’m wondering whether there is some framework that allows the input to be either a sentence, a paragraph, an article, or a book, and produce an output encoding on the same, fixed-sized format for all of them.