I am confused about the difference betwen fine runing and rlhf for llm. When to use what? I know RLHF need to creating a reward model which at furst rates responses to align the responses to the human preferences and afterward using this reward model to fine-tune.
But if thats the case, when fine-tuning is relevant anymore. When to use what?