1

I find it difficult to understand the following from the GPT-2 paper.

Language modeling is also able to, in principle, learn the tasks of McCann et al. (2018) without the need for explicit supervision of which symbols are the outputs to be predicted. Since the supervised objective is the the same as the unsupervised objective but only evaluated on a subset of the sequence, the global minimum of the unsupervised objective is also the global minimum of the supervised objective.

I assume the supervised objective is some loss function if we are to train on labeled data sets that look like (task, input) -> output. But what is the unsupervised objective here?

And why does it say that the supervised objective is only evaluated on a subset of the sequence? My understanding is that GPT-2 is never specifically trained on some of these tasks. So how can it say that it is evaluated on a subset of the sequence?

I would appreciate if someone helps me understand.

Tom Bennett
  • 111
  • 4

0 Answers0