Why ChatGPT output one token at a time?

Asked Mar 31 '23 at 22:46

Active Mar 31 '23 at 22:46

Viewed 43 times

My understanding of the language model is that the output of the model is a tensor. So the whole output should be computed all together. But why ChatGPT like models can output one token at a time like a stream? And the time difference between the output of first and last token is significant.

asked Mar 31 '23 at 22:46

Bin Wang

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Apr 01 '23 at 07:14
The output of a GPT model is a probability for each possible next token. – user253751 Apr 01 '23 at 08:09
See https://ai.stackexchange.com/questions/38923/why-does-chatgpt-not-give-the-answer-text-all-at-once/38929#38929 – Rexcirus Apr 01 '23 at 23:20

Why ChatGPT output one token at a time?

0 Answers0