Why can't language models, like GPT-3, continuously learn once trained?

Question

GPT-3 has a prompt limit of about ~2048 "tokens", which corresponds to about 4 characters in text. If my understanding is correct, a deep neural network is not learning after it is trained and is used to produce an output, and, as such, this limitation comes from amount of the input neurons. My question is: what is stopping us from using the same algorithm we use for training, when using the network? That would allow it to adjust its weights and, in a way, provide a form of long-term memory which could let it handle prompts with arbitrarily long limits. Is my line of thinking worng?

I'm not well acquainted with machine learning and the involved literature, so this is a very newbie question. I tried to make it as high quality as I could, but I'm certainly limited on how well I can make questions about a subject I don't know. Hope this kind if question is acceptable here. — MaiaVictor, Oct 08 '22 at 15:10

xojfqa · Answer 1 · 2022-10-08T19:10:37.427

5

In theory, there is nothing stopping you from updating the weights of a neural network whenever you like. You run an example through the network, calculate the difference between the network's output and the answer you expected, and run back propagation, exactly the same as you do when you initially train the network. Of course, usually networks are trained with large batches of data instead of single examples at a time, so if you wanted to do a weight update you should save up a bunch of data and pass through a batch (ideally the same batch size that you used during training, though there's nothing stopping you from passing in different batch sizes).

Keep in mind this is all theoretical. In practice, adjusting the weights on a deployed network will probably be very difficult because the models weights have been exported in a format optimized for inference. And it's better to have distinct releases with sets of weights that do not change rather than continuously updating the same model.

Either way, changing the weights continuously would not affect the "memory" of the network in any way. The lengths of sequences that sequence-to-sequence models like transformers or RNNs can accept is an entirely separate parameter.

edited Oct 08 '22 at 19:10

answered Oct 08 '22 at 19:05

xojfqa

101
4

1

Thanks for all the info. The only thing I don't get is why changing the weights continuously wouldn't allow it to have memory and learn new things. Isn't that how humans learn; by changing the weights of some specific neurons? – MaiaVictor Oct 09 '22 at 00:56
3

GTP-3 is trained on 45 TB data, so any additional examples you might give it will not make much difference https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai/ – Stefan Oct 09 '22 at 06:37
4

@MaiaVictor The memory in sequence-to-sequence models is more akin to working memory in humans: memory that is stored not in neuron synapses, but in the activity of neuron. The long term synaptic memory you're referring to is used by the network to learn the overall task. But this doesn't mean changing the weights will let you feed in arbitrary length sequences: those sequences are stored in the activity of the network's neurons. In RNNs, this working memory is modeled explicitly by gates in each neuron (look up LSTM or GRU). In transformers, it's much more complicated. – xojfqa Oct 09 '22 at 13:16
2

Keep in mind that while what Stefan said is true, you *can* make a big difference in the quality of GPT3's: not by training the network itself on new examples, but by including examples at the start of your prompt. This is called prompt engineering, good examples are available here https://beta.openai.com/examples/ – xojfqa Oct 09 '22 at 13:18
@Stefan isn't this just strongly hinting that training is terribly inefficient? Humans have seen way more than 45 TB of raw data in their lifetime, yet they can read a single sentence such as "jellyfish are biologically immortal" and learn a new concept for the rest of their life. Somehow humans are extremely good at storing new "insightful" information they're exposed very briefly, while ignoring massive "bland" amount of information (like all you see while commuting). – MaiaVictor Mar 26 '23 at 20:19

Faizy · Answer 2 · 2022-10-21T22:34:00.183

With $175$ Billion parameters, GPT-3 is remarkably large and powerful, but it has several limitations and risks associated with its usage. The biggest issue is that GPT-3 can't continuously learn once trained?. It has been pre-trained(as the name ~ Generative Pre-trained Transformer), which means that it doesn't have an ongoing long-term memory that learns from each interaction.

In addition, GPT-3 suffers from the same problems as all Neural Networks: their lack of ability to explain and interpret why certain inputs result in specific outputs.

Another reason could be that the model has reached a point of diminishing returns, meaning that any additional training is unlikely to result in significant improvements.

"A significant concern when building AI models like these is diminishing returns—that is, you cannot simply scale the model up forever. At some point, some factor(s) of the model will plateau, whether it’s the information generated, the dataset size, the training regime, etc".

However, at the level of GPT-2, there was no indication that this plateau had been reached. Thus, the “bigger and better” tactic continued, bringing us GPT-3". So, it may also be possible that the model has simply reached a plateau in its learning and is unable to make any further progress.

References:

Why can't language models, like GPT-3, continuously learn once trained?

2 Answers2