0

I have different sets of words as inputs, e.g.,

{governor, John Gary Evans, office, 1894}

or

{cheetah, 80km/h, mammal}

I would like to construct a grammatically correct sentence that contains a full set, or a subset of these tokens. So the outputs could be:

"Governor John Gary Evans took office in 1894." 

and

"Cheetahs can run at speeds of about 80km/s." 

Output should be one sentence only. How to use GPT2 or BERT for this task? Which of the two models is appropriate for this task? I understand one is unidirectional and the other is bidirectional in the way they generate each word of the output. However, i do not know in advance what the grammatically correct order of the input tokens should be, and the missing words might need to be inserted after, in between, or at the beginning of the output sentence.

Vy Do
  • 99
  • 3
Vladimir
  • 51
  • 2

1 Answers1

-3

Example with PyTorch

from transformers import GPT2Tokenizer, GPT2Model

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
text = "governor John Gary Evans office 1894"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
Vy Do
  • 99
  • 3