How does ChatGPT retain the context of previous questions?

Question

One of the innovations with OpenAI's ChatGPT is how natural it is for users to interact with it.

What is the technical enabler for ChatGPT to maintain the context of previous questions in its answers? For example, ChatGPT understands a prompt of "tell me more" and expands on it's previous answer.

Does it use activations from previous questions? Is there a separate input for the context? How does it work?

ChatGPT says: "ChatGPT uses a type of natural language processing called "transformer" architecture, which is a type of neural network that is designed to process and generate language. One of the key features of transformer architecture is that it allows the model to maintain a "context" of previous inputs. This is achieved through the use of self-attention mechanisms, which allow the model to weight different parts of the input differently based on how relevant they are to the current context. This allows the model to generate more nuanced and context-aware responses." — Rainb, Dec 08 '22 at 07:30
It does, indeed, but the transformer architecture alone does not explain this capability. Maintaining context requires something else. In fact, I _think_ being a transformer is irrelevant for this purpose. — milez, Dec 09 '22 at 06:13
My best guess is that the whole conversation is fed as input for the next reply. But looking into openai labelling guidance, it looks that the data collection is single Q&A only. — Rexcirus, Dec 09 '22 at 10:21
If my guess is correct, when the conversation is very long ChatGPT should start forgetting previous bits, due to the maximum sequence size. — Rexcirus, Dec 09 '22 at 10:22
The ability to maintain context is very mysterious. What is most mysterious to me is that if I continuously converse with ChatGPT it has no problem maintaining context. On the other hand, if I break in the middle of a conversation, and then come back to it some time later, it's as if ChatGPT has had a memory wipe and has a lot of trouble regaining the thread of the conversation. Since ChatGPT is a neural network where the output is generated based on the previous 2048 input tokens, the amount of time between responses should have no bearing on the output generated. — yters, Jan 09 '23 at 12:54
Great observation @yters. Could it be a bug: the interface correctly shows the previous conversation, but the service fails to send this to the OpenAI APIs after some time. — milez, Jan 10 '23 at 11:48
I don't have enough reputation to answer this question, but I've been playing around with text-davinci-003 quite a bit and can simulate the history that ChatGPT achieves with a relatively simple trick: in the prompt I ask it to summarize the conversation so far. I then pass that summary into the next prompt along with as much of the conversation as I think is appropriate to fit inside the token limit. text-davinci-003 does a very nice job of concisely summarizing the conversation. It's not as good as ChatGPT, but I do believe this is part of the magic. — RLC, Feb 01 '23 at 17:11
@Rainb why didn’t you put that as an answer? Follow up question - do you have a bit of references about these “transformer” models? Does OpenAI have any available to public? — Antonio, Feb 18 '23 at 09:21
@Antonio the transformer is now a standard building block of deep neural networks used widely in image, audio and language processing. Hence that answer by ChatGPT is not really informative. There are plenty of transformer models available in Huggingface for example, and most definitely all of OpenAIs language and image processing models employ the architecture. — milez, Feb 18 '23 at 12:11
I imagine something like the “memorizing Transformer” architecture will be adapted in time, if not already for GPT4. https://arxiv.org/abs/2203.08913 — Evan Zamir, Mar 14 '23 at 21:33

milez · Accepted Answer · 2023-02-18T12:09:34.050

27

Based on an answer by OpenAI, it appears that the illusion of understanding context is created by the model's capacity to accept very long input sequences. The OpenAI FAQ states that approximately 3000 words can be given as input. This together with the fact that GPT-3 was trained to produce text that continues a given prompt could explain the context feature.

In practice, each prompt is probably extended with the previous outputs and prompt, as much as the input sequence length allows. So, all of the context is actually in the prompt.

Edit 18.2.2023

After spending some time with large language models and reading up the theory, I think my old answer is an understatement of ChatGPT's capabilities.

It is likely that there are several engineering approaches to improve the context after the maximum content length is exceeded. These include (but are probably not limited to)

Using language models to summarize the conversation thus far, and using that as context
Using language models to search for the relevant context from the previous discussion (can be done by embedding questions and answers and doing a distance-based lookup in vector space), and feeding those as context with clever prompting like "If this information improves your answer, update your answer accordingly".

edited Feb 18 '23 at 12:09

answered Dec 11 '22 at 14:06

milez

861
1
5
10

2

So does that imply that if you keep chatting long enough, that ChatGPT starts to lose context as well, since you pass the threshold? That would certainly explain some brain dead responses I get sometimes.... like "I just told you that, you already forgot!?" – Benny Bottema Jan 17 '23 at 09:03
Yes, I think so. Most likely the latest messages (up to the token limit) are included as context, so that is the effective working memory ChatGPT has. – milez Jan 20 '23 at 06:26
2

I did some experiments and it seems the token limit is not a hard one. Or at least the context doesn't completely go away. I don't know how they determine what is kept, but after flushing ChatGPT with a lot of content (or rather having it flood me), it still remembered the fine details of the conversation's start. It's rather impressive. – Benny Bottema Jan 21 '23 at 09:42
Agreed. Mathematically, they can retain the previous content through marginalization (https://www.quora.com/What-is-marginalization-in-probability). This can be done using Bayesian assumptions or simply a DL model. Of course there are a lot of good engineering involved. The amazing part is most folks in NLU space knows how to create ChatGPT in theory but it takes a very good engineering / science team, plus resources, to get this done correctly and efficiently. – Keeeeeenw Feb 18 '23 at 16:22
Do you think ChatGPT used my interactions in the current session to enhance their answers? I think so based on my experience, but not sure. – Avv Jun 15 '23 at 15:49
@Avv yes it can use previous interactions in the current chat. But those will not carry over to future discussions. Not in real time anyway. Unless you opt out of the training data collection, your discussions on the chat UI may be used to train OpenAI models. – milez Jun 22 '23 at 05:03
@milez. from my experience I noticed that it's aware of all my previous interactions and use them to answer current interactions, but you might be right that it does not use them in a separate chat unless I start all over. – Avv Jun 22 '23 at 15:40
@Keeeeeenw What is marginalization for conversation or LLM specifically? – JP Zhang Aug 05 '23 at 06:09
What I wish we knew, and as far as I know OpenAI hasn't disclosed, is the actual strategy they use for managing context in lengthy conversations. I've noticed it can still remember "old" bits of the convo while quickly forgetting "new" things sometimes, so I think it's not simply truncating the oldest messages. Rather, they're doing some more advanced processing to conversation history before loading it into context, but unfortunately, I don't know exactly what, and we may not have any avenue to find out. Like @RLC suggested, summarization probably plays some part in that. – Ascendant Sep 01 '23 at 02:45

score 3 · Answer 2 · answered Feb 02 '23 at 16:45

Based on nothing but my own experience trying to build a similar chatbot using text-davinci-003, I think they are using the model itself to summarize the conversation, then feeding that summary back into the prompt. I get good results when doing this - though obviously the team behind ChatGPT does a better job.

Here is an example using the discussion under the question above:

You are a user named "AI" in an online forum. Below is a conversation about a question. Please add your contribution to the conversation.
Question: """
One of the innovations with OpenAI's ChatGPT is how natural it is for users to interact with it.
What is the technical enabler for ChatGPT to maintain the context of previous questions in its answers? For example, ChatGPT understands a prompt of "tell me more" and expands on it's previous answer.
Does it use activations from previous questions? Is there a separate input for the context? How does it work? """
Rainb: """
ChatGPT says: "ChatGPT uses a type of natural language processing called "transformer" architecture, which is a type of neural network that is designed to process and generate language. One of the key features of transformer architecture is that it allows the model to maintain a "context" of previous inputs. This is achieved through the use of self-attention mechanisms, which allow the model to weight different parts of the input differently based on how relevant they are to the current context. This allows the model to generate more nuanced and context-aware responses." """
milez: """
It does, indeed, but the transformer architecture alone does not explain this capability. Maintaining context requires something else. In fact, I think being a transformer is irrelevant for this purpose. """
Rexcirus: """
My best guess is that the whole conversation is fed as input for the next reply. But looking into openai labelling guidance, it looks that the data collection is single Q&A only. """
Rexcirus: """
If my guess is correct, when the conversation is very long ChatGPT should start forgetting previous bits, due to the maximum sequence size. """
yters: """
The ability to maintain context is very mysterious. What is most mysterious to me is that if I continuously converse with ChatGPT it has no problem maintaining context. On the other hand, if I break in the middle of a conversation, and then come back to it some time later, it's as if ChatGPT has had a memory wipe and has a lot of trouble regaining the thread of the conversation. Since ChatGPT is a neural network where the output is generated based on the previous 2048 input tokens, the amount of time between responses should have no bearing on the output generated. """
milez: """
Great observation @yters. Could it be a bug: the interface correctly shows the previous conversation, but the service fails to send this to the OpenAI APIs after some time. """
AI: """
It is possible that the OpenAI API may be designed to automatically discard the context after some time to avoid the system being overburdened with large conversations. Another possibility is that the system may be storing the context, but using a different mechanism than the transformer architecture to access it.
"""

Now, I'll ask it to summarize the conversation so far:

You are a user named "AI" in an online forum. Below is a conversation about a question. Please summarize the interaction so far.

[I then pasted in everything from the Question down in the previous prompt/response]

The conversation so far has focused on understanding the technical enablers that allow ChatGPT to maintain context in its answers. Rainb has suggested that this is done through transformer architecture and self-attention mechanisms. Rexcirus has suggested that the whole conversation may be fed as input for the next reply. Yters has then posed the interesting question of why, when the conversation is broken off and then resumed, ChatGPT has difficulty regaining the thread. Milez has suggested that this could be a bug in the interface.

Finally, I delete most of the conversation except for the summary and last few responses and alter the prompt a little to alert it to the presence of the summary:

You are a user named "AI" in an online forum. Below is a summary of the conversation so far about a question, and the last few responses to that question. Please add your contribution to the conversation.

Summary: The conversation so far has focused on understanding the technical enablers that allow ChatGPT to maintain context in its answers. Rainb has suggested that this is done through transformer architecture and self-attention mechanisms. Rexcirus has suggested that the whole conversation may be fed as input for the next reply. Yters has then posed the interesting question of why, when the conversation is broken off and then resumed, ChatGPT has difficulty regaining the thread. Milez has suggested that this could be a bug in the interface.

yters: """
The ability to maintain context is very mysterious. What is most mysterious to me is that if I continuously converse with ChatGPT it has no problem maintaining context. On the other hand, if I break in the middle of a conversation, and then come back to it some time later, it's as if ChatGPT has had a memory wipe and has a lot of trouble regaining the thread of the conversation. Since ChatGPT is a neural network where the output is generated based on the previous 2048 input tokens, the amount of time between responses should have no bearing on the output generated. """
milez: """
Great observation @yters. Could it be a bug: the interface correctly shows the previous conversation, but the service fails to send this to the OpenAI APIs after some time. """

AI: That is an interesting point, @Milez. It could be that the interface does not accurately display the previous conversation to the API, or that there is a bug in the API that causes it to not be able to accurately generate a response with the 2048 input tokens when the conversation is broken off and then resumed. It would be interesting to investigate further.

I've had some success with this method, though it does still lose detail. Improving the prompt to create a more detailed summary helps. It's amazing how much of a conversation is full of repetitive filler, and how concisely it can be summarized. Maybe this is some kind of lingual compression? :)

Lemme run your answer through GPT-4... – YakovK Mar 26 '23 at 20:33 — YakovK, Mar 26 '23 at 20:33

How does ChatGPT retain the context of previous questions?

2 Answers2

Linked