Questions tagged [gpt]

For questions related to GPT (which stands for Generative Pre-Training), which is a combination of transformers (proposed in "Attention is All You Need") and unsupervised pre-training for solving language tasks, such as machine translation. GPT was proposed in "Improving Language Understanding by Generative Pre-Training" (2018) by Open AI. There's also GPT-2, which was proposed in "Language Models are Unsupervised Multitask Learners" (2019) by Open AI.

77 questions

votes

4 answers

Why is ChatGPT bad at math?

As opposed to How does ChatGPT know math?, I've been seeing some things floating around the Twitterverse about how ChatGPT can actually be very bad at math. For instance, I asked it "If it takes 5 machines 5 minutes to make 5 devices, how long would…

asked Dec 08 '22 at 23:12

Mithical

2,885
5
27
39

votes

1 answer

What is the "temperature" in the GPT models?

What does the temperature parameter mean when talking about the GPT models? I know that a higher temperature value means more randomness, but I want to know how randomness is introduced. Does temperature mean we add noise to the weights/activations…

machine-learning terminology gpt language-model gpt-3

asked Nov 21 '21 at 01:34

Tom Dörr

votes

1 answer

What exactly are the "parameters" in GPT-3's 175 billion parameters and how are they chosen/generated?

When I studied neural networks, parameters were learning rate, batch size etc. But even GPT3's ArXiv paper does not mention anything about what exactly the parameters are, but gives a small hint that they might just be sentences. Even tutorial…

recurrent-neural-networks open-ai transformer attention gpt

asked Jul 26 '20 at 08:12

Nav

votes

2 answers

Why does GPT-2 Exclude the Transformer Encoder?

After looking into transformers, BERT, and GPT-2, from what I understand, GPT-2 essentially uses only the decoder part of the original transformer architecture and uses masked self-attention that can only look at prior tokens. Why does GPT-2 not…

natural-language-processing transformer attention bert gpt

asked Mar 27 '21 at 19:55

Athena Wisdom

votes

1 answer

How does the (decoder-only) transformer architecture work?

How does the (decoder-only) transformer architecture work which is used in impressive models such as GPT-4?

deep-learning transformer attention gpt large-language-models

asked Apr 23 '23 at 19:28

Robin van Hoorn

1,810
7
32

votes

2 answers

Is GPT-4 based on GPT-3 or was it trained from the scratch?

To me it looks like GPT-4 is based on GPT-3. On the other hand, there were rumors that training of GPT-3 was done with errors, but re-train was impossible due to the costs.

open-ai gpt gpt-3 gpt-4

asked Mar 16 '23 at 17:44

Anixx

votes

2 answers

What is the difference between the positional encoding techniques of the Transformer and GPT?

I know the original Transformer and the GPT (1-3) use two slightly different positional encoding techniques. More specifically, in GPT they say positional encoding is learned. What does that mean? OpenAI's papers don't go into detail very much. How…

comparison transformer gpt positional-encoding

asked Nov 23 '20 at 22:03

Leevo

votes

1 answer

How do we know if GPT-2 is a better language model?

You may have heard of GPT2, a new language model. It has recently attracted attention from the general public as the foundation that published the paper, OpenAI, ironically refused to share the whole model fearing dangerous implications. Along the…

natural-language-processing transformer gpt

asked Feb 25 '19 at 09:51

Lucas Morin

votes

5 answers

How is GPT 4 able to solve math?

How can GPT 4 solve complex calculus and other math problems. I believe these problems require analytical reasoning and ability to compute numbers. Does it still use a LLM to complete this process or does it add on to this? Here is the link to the…

chatgpt gpt gpt-4

asked Mar 22 '23 at 23:49

desert_ranger

votes

2 answers

Where can I find pre-trained language models in English and German?

Where can I find (more) pre-trained language models? I am especially interested in neural network-based models for English and German. I am aware only of Language Model on One Billion Word Benchmark and TF-LM: TensorFlow-based Language Modeling…

neural-networks natural-language-processing bert gpt language-model

asked Aug 23 '18 at 07:17

Lutz Büch

votes

1 answer

How does GPT-based language model like ChatGPT determine the n-th letter of a word?

I understand that GPT models process input text by converting words into tokens and then embedding vectors and do not process them letter by letter. Given this approach, I am curious to know how a model like ChatGPT can identify the first (or n-th)…

natural-language-processing chatgpt gpt natural-language-understanding language-model

asked Apr 23 '23 at 02:30

Peyman

votes

2 answers

How is the next token predicted in transformers?

In the transformer (or GPT/decoder only), at the end of the decoder blocks but before the final linear layer you have X vectors (for the X tokens at the input of the decoder). We then want to compute the probabilities for the next token of the…

natural-language-processing transformer gpt language-model

asked Apr 21 '23 at 00:48

Miguel Carvalho

votes

1 answer

What can GPT-4 do linguistics-wise?

I have no access to GPT-4, but I wonder whether it can do the following (where ChatGPT failed). Make syntactic and morphological analysis of sentences in a language like Russian, marking cases, parts of speech and sentence, conjugations of verbs,…

gpt computational-linguistics

asked Mar 16 '23 at 18:17

Anixx

votes

1 answer

Is the Mask Needed for Masked Self-Attention During Inference with GPT-2

My understanding is that masked self-attention is necessary during training of GPT-2, as otherwise it would be able to directly see the correct next output at each iteration. My question is whether the attention mask is necessary, or even possible,…

natural-language-processing attention transformer gpt inference

asked Nov 14 '19 at 11:41

D_s

votes

2 answers

What sort of computer would be necessary to run queries on a LLM?

I've heard that to train a model like GPT 4.0 you need a very powerful computer and ~$10M of computing power, but once you've produced the trained ~570GB model, what sort of computing power is necessary to execute specific queries with it?

gpt

asked May 08 '23 at 15:01

ak0000

2 3 4 5 6 Next