3

Most machine learning models, such as multilayer perceptrons, require a fixed-length input and output, but generative (pre-trained) transformers can produce sentences or full articles of variable length. How is this possible?

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

4

In short, repetition with feedback.

You are correct that machine learning (ML) models such as neural networks work with fixed dimensions for input and output. There are a few different ways to work around this when desired input and output are more variable. The most common approaches are:

  • Padding: Give the ML model capacity to cope with the largest expected dimensions then pad inputs and filter outputs as necessary to match logical requirements. For example, this might be used for an image classifier where the input image varies in size and shape.

  • Recurrent models: Add an internal state to the ML model and use it to pass data along with each input, in order to work with sequences of identical, related inputs or outputs. This is a preferred architecture for natural language processing (NLP) tasks, where LSTMs, GRUs, and transformer networks are common choices.

A recurrent model relies on the fact that each input and output is the same kind of thing, at a different point in the sequence. The internal state of the model is used to combine information between points in the sequence so that for example a word in position three in the input has an impact on the choice of the word at position seven of the output.

Generative recurrent models often use their own output (or a sample based on the probabilities expressed in the output) as the next step's input.

It is well worth reading this blog for an introduction and some examples: The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy

hanugm
  • 3,571
  • 3
  • 18
  • 50
Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • This answer could probably do with an update to explain how transformers do this. I am not 100% sure, but it probably involves manipulating the input based on previous output, similar to, but not the same as RNNs. – Neil Slater Sep 01 '23 at 08:14