2

I would like to design a reward function. I am training two models from the first model that classify set of texts (paragraphs and keywords) and I also got some hidden states. The second model is trying to generate keywords for those paragraphs.

I want to use those hidden states from the first model to give rewards for key phrases that are generated from the second model. I want to know how can I implement this reward function since I have never used it before.

nbro
  • 39,006
  • 12
  • 98
  • 176
No Na
  • 21
  • 1
  • 1
    the question sounds a bit confusing to me, can you reframe it explain in more details which models you're using and where RL comes into play? I don't get what kind of policy you want to train. From what I understood you have a model that classify some text into paragraphs and another one that extract keywords for each paragraph, is it correct? – Edoardo Guerriero Apr 16 '20 at 13:39

0 Answers0