Questions tagged [gpt-3]

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive large language model that uses deep learning to produce human-like text. Introduced in May 2020, by OpenAI, with 175 billion parameters.

Check more in Wikipedia

32 questions
27
votes
1 answer

What is the "temperature" in the GPT models?

What does the temperature parameter mean when talking about the GPT models? I know that a higher temperature value means more randomness, but I want to know how randomness is introduced. Does temperature mean we add noise to the weights/activations…
Tom Dörr
  • 393
  • 1
  • 3
  • 7
15
votes
1 answer

What language is the GPT-3 engine written in?

I know that the API is python based, but what's the gpt-3 engine written in mostly? C? C++? I'm having some trouble finding this info.
8
votes
2 answers

Is GPT-4 based on GPT-3 or was it trained from the scratch?

To me it looks like GPT-4 is based on GPT-3. On the other hand, there were rumors that training of GPT-3 was done with errors, but re-train was impossible due to the costs.
Anixx
  • 301
  • 8
8
votes
2 answers

Are GPT-3.5 series models based on GPT-3?

In the official blog post about ChatGPT from OpenAI, there is this paragraph explaining how ChatGPT model was trained: We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with…
iMad
  • 183
  • 4
8
votes
1 answer

What causes ChatGPT to generate responses that refer to itself as a bot or LM?

ChatGPT occasionally generates responses to prompts that refer to itself as a "bot" or "language model." For instance, when given a certain input (the first paragraph of this question) ChatGPT produces (in part) the output: It is not appropriate…
4
votes
1 answer

What's the difference between GPT3.5 and InstructGPT?

I read about the different model series in GPT3.5 here - https://platform.openai.com/docs/models/gpt-3-5 At the beginning of the page, it mentions to look at https://platform.openai.com/docs/model-index-for-researchers to understand the difference…
Arya
  • 41
  • 2
4
votes
0 answers

How is ChatGPT maintaining context?

It has been suggested in the answer to this earlier question that it is just remembering a certain amount of recent information. The reference used is this post by OpenAI which says that ChatGPT should only be able to maintain a context of around…
4
votes
1 answer

How to get GPT-3 to translate a specific word in a sentence?

I just gave GPT-3 the following prompt (in the playground, using text-davinci-001 with default settings): What's the German word for "can" in the sentence "The man removes the can."? The word "can" in this sentence is obviously a noun and not a…
4
votes
3 answers

How can GPT-3 be used for designing electronic circuits from text descriptions?

I was wondering if it is possible to use GPT-3 to translate text description of a circuit to any circuit design language program, which in turn can be used to make the circuit. If it is possible, what approach will you suggest?
2
votes
1 answer

If GPT-3 is trained on predicting the next token, how is it able to take commands?

From my understanding, GPT-3 is trained on predicting the next token from a sequence of tokens. Given this, how is it able to take commands? For instance, in this example input, wouldn't the statistically most likely prediction be to insert a period…
2
votes
2 answers

How do transformers understand data and answer custom questions?

I recently heard of GPT-3 and I don't understand how the attention models and transformers encoders and decoders work. I heard that GPT-3 can make a website from a description and write perfectly factual essays. How can it understand our world using…
1
vote
1 answer

Repainting a picture in the style of some painter (or of another picture)

It sounds like a straight-forward task for DALL-E (and GPT?) to present a painting and ask to repaint it "in the style of Leonardo da Vinci". Like one can present texts and ask to rewrite them in the style of some author. Or even better: to present…
Hans-Peter Stricker
  • 811
  • 1
  • 8
  • 20
1
vote
0 answers

How much do we know about the architectures of the Codex (prototype) models?

The transformer model Codex by OpenAI was introduced in a 2021 paper. The paper does not give complete information about the architecture. Below I've quoted all the passages in the paper that give hints as to the architecture: ...we hypothesized…
Jack M
  • 242
  • 1
  • 8
1
vote
0 answers

Computation required for GPT model to choose likely word from n-options where n < total vocabulary size

Let’s imagine two different use cases for a LLM/GPT-3. Predicting the next most likely word in a sequence using all ~50k words in its dictionary (i.e. the standard method of prompting a LLM) Checking whether "Word-1" is more likely than "Word-2" to…
1
vote
1 answer

Fine-tune GPT-Neo with prompt and completion?

I'm new to AI and machine learning. To fine-tune GPT-3, I understand that we need a set of training examples that each consist of a single input ("prompt") and its associated output ("completion"). I have prepared a dataset with "prompt" and…
SoftTimur
  • 111
  • 3
1
2 3