Can pretraining be continued after RLHF?

Question

Assume you have a pretrained transformer language model (M1) which already underwent reinforcement learning by human feedback (M2). I assume that it is in principle possible to continue the pretraining after RLHF with some additional documents, e.g. high-quality scientific papers (e.g. the whole arXiv), yielding M3.

My question is: Would there be a difference in the quality of answers to scientific questions (that are covered in the additional documents), between M3 (pretraining+RLHF+pretraining) and a model M4 for which pretraining with the additional documents was continued immediately on M1 (pretraining+pretraining+RLHF)?

The question concerns the interference of pretraining and RLHF. RLHF of course has to be performed on a pretrained model (and doesn't affect the linguistic and implicit world knowledge of the pretrained model too much), but continuation of pretraining only after RLHF might cause more trouble: both general world and aligned knowledge might be reduced, and scientific knowledge not so much advanced.

Is there an argument that makes it plausible that pretraining+RLHF+pretraining should work well -- or on the contrary would not work at all?

personally, i think that M3 would perform worse than M4, because RLHF it's aligning the "output structure" to be human-like, or in other words, to be more human friendly. However, this structure is far from the scientific-paper language structure, thus the autoregressive nature will have a hard time at the beginning, thus having to go ever further from the initial pretraining (pretty much like a continual learning problem) — Alberto, Aug 16 '23 at 09:45
I think this is hard to assess. As in there is no objective assessment of how models M2 onwards perform. So could you please suggest what performance metric or measurements would be important to you in M3 and M4? — Neil Slater, Aug 16 '23 at 11:12

Can pretraining be continued after RLHF?

0 Answers0