4

Recently, some work has been done planning and learning in Non-Markovian Decision Processes, that is, decision-making with temporally extended rewards. In these settings, a particular reward is received only when a particular temporal logic formula is satisfied (LTL or CTL formula). However, I cannot find any work about learning which rewards correspond to which temporally extended behavior.

In my searches, I came across k-order MDPs (which are non-Markovian). I did not find RL research done on k-order MDPs.

nbro
  • 39,006
  • 12
  • 98
  • 176
Gavin Rens
  • 41
  • 3
  • non-Markovian reward functions are quite old concepts. They were first introduced in the theory for control systems. You can take any objective fn. of a control system and apply planning and learning concepts to it. – abunickabhi Apr 07 '19 at 18:11
  • By "reward functions", do you mean "value functions"? – nbro Apr 08 '19 at 09:36
  • I don't mean the value function. I am assuming that the environment/society knows the rewards to give, and the agent wants to learn a model of what sequence of states visited results in what rewards. – Gavin Rens Apr 08 '19 at 19:46

0 Answers0