How is ChatGPT trained?

Question

According to OpenAI, ChatGPT is trained in a 3-step process. Are the steps where human AI trainers are involved, i.e. training the initial policy and providing the A>B>C>D grading as training sets for the reward model, the ONLY place where actual knowledge is entered? Are there no other learning steps where human trainers are NOT involved and the model learns from high-quality sources like authoritative texts? The sampled prompts in step 2 are supposed to cover EVERYTHING on the internet, so what happens when the trainers do not know anything about the topic? Did OpenAI invite domain experts to grade specific prompts-answer pairs?

This is impossible to answer as there is no more public information on how ChatGPT was trained. — Dr. Snoopy, Mar 13 '23 at 11:15
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. — Community, Mar 13 '23 at 15:51

score 1 · Answer 1 · answered Mar 13 '23 at 16:31

Step 2 is (mainly) useful to tune the style, alignement and general feeling of the ChatGPT answers, not the specifics of what is answered.

The actual "knowledge" comes simply from ingesting a large corpora of text as training dataset and using unsupervised learning technques on it. This knowledge has been encapsulated in the GPT3.5 on which ChatGPT is built on. To be super clear: ChatGPT is a specialised version of GPT3.5.

How is ChatGPT trained?

1 Answers1

Linked