For questions related to the concept of a language model, which is a probability distribution over sequences of words (for example, of a natural language, such as English).
Questions tagged [language-model]
68 questions
46
votes
3 answers
Was ChatGPT trained on Stack Overflow data?
Has ChatGPT used highly rated and upvoted questions/answers from Stack Overflow in its training data?
For me it makes complete sense to take answers that have upwards of 100 upvotes and include them in your training data, but people around me seem…

Nicolas Zein
- 561
- 1
- 2
- 3
27
votes
1 answer
What is the "temperature" in the GPT models?
What does the temperature parameter mean when talking about the GPT models?
I know that a higher temperature value means more randomness, but I want to know how randomness is introduced.
Does temperature mean we add noise to the weights/activations…

Tom Dörr
- 393
- 1
- 3
- 7
23
votes
4 answers
How does ChatGPT know math?
ChatGPT is a language model. As far as I know and If I'm not wrong, it gets text as tokens and word embeddings. So, how can it do math? For example, I asked:
ME: Which one is bigger 5 or 9.
ChatGPT: In this case, 9 is larger than 5.
One can say,…

Peyman
- 534
- 3
- 10
13
votes
2 answers
Why does ChatGPT not give the answer text all at once?
When ChatGPT is generating an answer to my question, it generates it word by word.
So I actually have to wait until I get the final answer.
Is this just for show?
Or is it really real-time generating the answer word by word not knowing yet what the…

Sander van den Oord
- 231
- 1
- 5
8
votes
1 answer
What causes ChatGPT to generate responses that refer to itself as a bot or LM?
ChatGPT occasionally generates responses to prompts that refer to itself as a "bot" or "language model."
For instance, when given a certain input (the first paragraph of this question) ChatGPT produces (in part) the output:
It is not appropriate…

Obie 2.0
- 183
- 6
7
votes
2 answers
Why can't language models, like GPT-3, continuously learn once trained?
GPT-3 has a prompt limit of about ~2048 "tokens", which corresponds to about 4 characters in text. If my understanding is correct, a deep neural network is not learning after it is trained and is used to produce an output, and, as such, this…

MaiaVictor
- 355
- 1
- 9
7
votes
1 answer
How to use BERT as a multi-purpose conversational AI?
I'm looking to make an NLP model that can achieve a dual purpose. One purpose is that it can hold interesting conversations (conversational AI), and another being that it can do intent classification and even accomplish the classified task.
To…

junfanbl
- 323
- 1
- 7
6
votes
1 answer
How was ChatGPT trained?
I know that large language models like GPT-3 are trained simply to continue pieces of text that have been scraped from the web. But how was ChatGPT trained, which, while also having a good understanding of language, is not directly a language model,…

HelloGoodbye
- 313
- 1
- 11
6
votes
1 answer
What are pros and cons of Bi-LSTM as compared to LSTM?
What are the pros and cons of LSTM vs Bi-LSTM in language modelling? What was the need to introduce Bi-LSTM?

DRV
- 1,573
- 2
- 11
- 18
5
votes
2 answers
Where can I find pre-trained language models in English and German?
Where can I find (more) pre-trained language models? I am especially interested in neural network-based models for English and German.
I am aware only of Language Model on One Billion Word Benchmark and TF-LM: TensorFlow-based Language Modeling…

Lutz Büch
- 161
- 7
5
votes
1 answer
How does GPT-based language model like ChatGPT determine the n-th letter of a word?
I understand that GPT models process input text by converting words into tokens and then embedding vectors and do not process them letter by letter. Given this approach, I am curious to know how a model like ChatGPT can identify the first (or n-th)…

Peyman
- 534
- 3
- 10
5
votes
2 answers
How is the next token predicted in transformers?
In the transformer (or GPT/decoder only), at the end of the decoder blocks but before the final linear layer you have X vectors (for the X tokens at the input of the decoder). We then want to compute the probabilities for the next token of the…

Miguel Carvalho
- 51
- 1
5
votes
1 answer
How can a language model keep track of the provenance of the main knowledge/sources used to generate a given output?
One of the main criticisms against the use of ChatGPT on Stack Exchange is that it doesn't attribute the main knowledge/sources used to generate a given output. How can a language model keep track of the provenance of the main knowledge/sources used…

Franck Dernoncourt
- 2,626
- 1
- 19
- 31
5
votes
2 answers
What is the difference between a language model and a word embedding?
I am self-studying applications of deep learning on the NLP and machine translation.
I am confused about the concepts of "Language Model", "Word Embedding", "BLEU Score".
It appears to me that a language model is a way to predict the next word given…

Exploring
- 223
- 6
- 16
4
votes
2 answers
What makes reproducing a model like GPT3/GPT3.5/ChatGPT difficult?
Is it difficult for other companies to train a model similar to ChatGPT, and what makes it difficult? What is challenging about reproducing the results obtained by OpenAI with ChatGPT/GPT3.5? Would it be possible for a company like Meta or Google to…

Robin van Hoorn
- 1,810
- 7
- 32