Are there transformer-based architectures that can produce fixed-length vector encodings given arbitrary-length text documents?

Question

BERT encodes a piece of text such that each token (usually words) in the input text map to a vector in the encoding of the text. However, this makes the length of the encoding vary as a function of the input length of the text, which makes it more cumbersome to use as input to downstream neural networks that take only fixed-size inputs.

Are there any transformer-based neural network architectures that can encode a piece of text into a fixed-size feature vector more suitable for downstream tasks?

Edit: To illustrate my question, I’m wondering whether there is some framework that allows the input to be either a sentence, a paragraph, an article, or a book, and produce an output encoding on the same, fixed-sized format for all of them.

[This](https://ai.stackexchange.com/questions/22957/how-can-transformers-handle-arbitrary-length-input?rq=1#comment35118_22960) might help and partially answers your question. You can only try to reduce the size of the obtained state with some convolution, but I don't think this is being done yet. — N. Kiefer, Sep 18 '20 at 13:12
BERT does provide a fixed-size output. The encoding of the special `[CLS]` token, which is always prepended to every input example, is meant to encode the entire sentence. Since it is a token, its encoding is always a fixed-length vector of length _H_, e.g. H=768 for BERT-Base. Specifically, the `[CLS]` encoding is passed through a "pooling layer", which is just a _HxH_ fully-connected layer, with linear activation. This is called `pooled_output` in the [TF Hub module](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/2). — primussucks, Sep 21 '20 at 16:14
As mentioned, use the CLS token. Else you could just define some max_length, and pad to it when its to low and then use the mean of the tokens. But, look at sentence-bert for e.g https://www.sbert.net/#usage - it generates fixed sized sentence embeddings with varying input sizes :) — Isbister, Sep 30 '20 at 15:43

score 0 · Answer 1 · answered Feb 10 '22 at 04:32

One way you could do it is by using SentenceTransformers.

SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.

To install it via pip

pip install -U sentence-transformers

To generate sentence embedding

from sentence_transformers import SentenceTransformer

# We are using "paraphrase-MiniLM-L6-v2" model here, You can find list of model [here][2]
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

# Our sentences we like to encode
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.',
    'The quick brown fox jumps over the lazy dog.']

# Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)

#Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")

But remember, SentenceTransformers models have an input limit as well, 512 tokens usually. If your text is more than that, then it might not be a suitable method.

Thanks for you answer. Well, if it has an input limit, the input isn’t truly of arbitrary length (in practice I guess the input also becomes of fixed length, and that you just use a padding token to fill it out after the text has ended). I was wondering whether there is some framework that allows the input to be either a sentence, a paragraph, an article, or a book, and produce an output encoding on the same, fixed-sized format for all of them. — HelloGoodbye, Feb 11 '22 at 21:23

Are there transformer-based architectures that can produce fixed-length vector encodings given arbitrary-length text documents?

1 Answers1