0

I'm thinking of a situation like a game (say, chess) where the real objective/reward is actually determined at the very end.

I understand that it's important/helpful to do reward shaping with intermediate rewards, so that the agent can get clues of what is good/bad behavior leading to the final result. However, I would greatly appreciate advice or discussion about whether these intermediate rewards should be phased out over time.

For example, let's say the agent is playing chess. I imagine its helpful to give rewards/punishments for captured/lost pieces and then a BIG reward/punishment at end for victory/defeat. As training goes on, though, would you recommend decaying/removing the intermediate rewards?

Vladimir Belik
  • 342
  • 2
  • 12

0 Answers0