My understanding of the language model is that the output of the model is a tensor. So the whole output should be computed all together. But why ChatGPT like models can output one token at a time like a stream? And the time difference between the output of first and last token is significant.
Asked
Active
Viewed 43 times
0
-
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Apr 01 '23 at 07:14
-
The output of a GPT model is a probability for each possible next token. – user253751 Apr 01 '23 at 08:09
-
See https://ai.stackexchange.com/questions/38923/why-does-chatgpt-not-give-the-answer-text-all-at-once/38929#38929 – Rexcirus Apr 01 '23 at 23:20