Questions tagged [instruct-gpt]
5 questions
5
votes
2 answers
InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?
InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?
$$ \operatorname{loss}(\theta) = -\frac{1}{\binom{K}{2}}E_{(x,y_w,y_l)\sim D}[\log(\sigma(r_{\theta}(x, y_w) - r_{\theta}(x, y_l)))] $$
The equation was taken…

Nathan B
- 161
- 2
4
votes
1 answer
What's the difference between GPT3.5 and InstructGPT?
I read about the different model series in GPT3.5 here - https://platform.openai.com/docs/models/gpt-3-5
At the beginning of the page, it mentions to look at https://platform.openai.com/docs/model-index-for-researchers to understand the difference…

Arya
- 41
- 2
1
vote
1 answer
Repainting a picture in the style of some painter (or of another picture)
It sounds like a straight-forward task for DALL-E (and GPT?) to present a painting and ask to repaint it "in the style of Leonardo da Vinci". Like one can present texts and ask to rewrite them in the style of some author. Or even better: to present…

Hans-Peter Stricker
- 811
- 1
- 8
- 20
1
vote
0 answers
What does "shuffle the comparisons into one dataset" mean?
I couldn't understand the wording here.
What does "shuffle the comparisons into one dataset" mean?
How does the method they use don't have $K \choose 2$ forward passes for K completions? Do they update $K \choose 2$ in an epoch for K completions or…
0
votes
1 answer
How is InstructGPT a fine-tuned version of GPT-3 and at the same time has fewer parameters than the original GPT3?
I am reading the paper "Training language models to follow instructions with human feedback"
It says:
Our labelers provide demonstrations of the desired behavior on the input prompt distribution (see Section 3.2 for details on this distribution).…

DanielTheRocketMan
- 113
- 4