0

How (if possible at all) rewards (from reinforcement learning) can be used to generate data for supervised learning? This is very topical question, because human feedback usually comes in the form or single-number rating, but this rating should be used on the updating the models that were trained using supervised learning (even masked data approach).

I am just starting to explore this topic and I have found so far the entering points into this realm:

I just have sense that this should be possible and that I am not aware of some important trend.

TomR
  • 823
  • 5
  • 15

0 Answers0