0

My understanding of the language model is that the output of the model is a tensor. So the whole output should be computed all together. But why ChatGPT like models can output one token at a time like a stream? And the time difference between the output of first and last token is significant.

Bin Wang
  • 109
  • 2

0 Answers0