How to generate a sentence containing a specific set of tokens using GPT2 or BERT?

Question

I have different sets of words as inputs, e.g.,

{governor, John Gary Evans, office, 1894}

or

{cheetah, 80km/h, mammal}

I would like to construct a grammatically correct sentence that contains a full set, or a subset of these tokens. So the outputs could be:

"Governor John Gary Evans took office in 1894."

and

"Cheetahs can run at speeds of about 80km/s."

Output should be one sentence only. How to use GPT2 or BERT for this task? Which of the two models is appropriate for this task? I understand one is unidirectional and the other is bidirectional in the way they generate each word of the output. However, i do not know in advance what the grammatically correct order of the input tokens should be, and the missing words might need to be inserted after, in between, or at the beginning of the output sentence.

score -3 · Answer 1 · answered May 25 '23 at 06:12

-3

Example with PyTorch

from transformers import GPT2Tokenizer, GPT2Model

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
text = "governor John Gary Evans office 1894"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

answered May 25 '23 at 06:12

Vy Do

99
3

1

Could you please elaborate this example? Should output be a string, a suggested sentence? – Vladimir May 26 '23 at 20:58
I did not try on my PC. I see the same example on Hugging Faces website, then I revised it in your context. – Vy Do May 27 '23 at 07:51
I've tried it and it does not generate meaningful output :/ – Vladimir Jun 05 '23 at 10:50

How to generate a sentence containing a specific set of tokens using GPT2 or BERT?

1 Answers1