Highest Voted 'rlhf' Questions - Artificial Intelligence Stack Exchange

2

votes

1 answer

Why do we need RL in RLHF?

In RLHF, the reward function is a neural network. This means we can compute its gradients cheaply and accurately through backpropagation. Now, we want to find a policy that maximizes reward (see https://arxiv.org/abs/2203.02155). Then, why do we…

asked Mar 02 '23 at 22:19

DeltaIV

184
1
8

0

votes

0 answers

Negative KL-divergence RLHF implementation

I am struggling to understand one part of the FAQ of the transformer reinforcement learning library from HuggingFace: What Is the Concern with Negative KL Divergence? If you generate text by purely sampling from the model distribution things work…

proximal-policy-optimization kl-divergence rlhf

asked Sep 01 '23 at 10:25

probably45

1

0

votes

0 answers

Can pretraining be continued after RLHF?

Assume you have a pretrained transformer language model (M1) which already underwent reinforcement learning by human feedback (M2). I assume that it is in principle possible to continue the pretraining after RLHF with some additional documents, e.g.…

reinforcement-learning transformer unsupervised-learning pretrained-models rlhf

asked Aug 16 '23 at 08:02

Hans-Peter Stricker

811
1
8
20

0

votes

1 answer

What is the difference betwen fine runing and rlhf for llm?

I am confused about the difference betwen fine runing and rlhf for llm. When to use what? I know RLHF need to creating a reward model which at furst rates responses to align the responses to the human preferences and afterward using this reward…

large-language-models fine-tuning gpt-4 few-shot-learning rlhf

asked Jul 28 '23 at 00:42

Exploring

223
6
16

Questions tagged [rlhf]

Why do we need RL in RLHF?

Negative KL-divergence RLHF implementation

Can pretraining be continued after RLHF?

What is the difference betwen fine runing and rlhf for llm?