0

I am confused about the difference betwen fine runing and rlhf for llm. When to use what? I know RLHF need to creating a reward model which at furst rates responses to align the responses to the human preferences and afterward using this reward model to fine-tune.

But if thats the case, when fine-tuning is relevant anymore. When to use what?

Exploring
  • 223
  • 6
  • 16

1 Answers1

1

RLHF is just one possibility of fine-tuning for generative LLMs, which is used to align an LLM to human tastes.

However, you could just create a bunch of great data, and fine-tune (take a pretrained models and just slightly adjust it) on this high quality data, without any RLHF involved (see for example what phi-1 has done)

Alberto
  • 591
  • 2
  • 10