What is the difference betwen fine runing and rlhf for llm?

Question

I am confused about the difference betwen fine runing and rlhf for llm. When to use what? I know RLHF need to creating a reward model which at furst rates responses to align the responses to the human preferences and afterward using this reward model to fine-tune.

But if thats the case, when fine-tuning is relevant anymore. When to use what?

score 1 · Answer 1 · answered Jul 28 '23 at 23:39

RLHF is just one possibility of fine-tuning for generative LLMs, which is used to align an LLM to human tastes.

However, you could just create a bunch of great data, and fine-tune (take a pretrained models and just slightly adjust it) on this high quality data, without any RLHF involved (see for example what phi-1 has done)

What is the difference betwen fine runing and rlhf for llm?

1 Answers1