What research has been done on learning non-Markovian reward functions?

Asked Apr 07 '19 at 17:45

Active Dec 19 '21 at 18:47

Viewed 68 times

Recently, some work has been done planning and learning in Non-Markovian Decision Processes, that is, decision-making with temporally extended rewards. In these settings, a particular reward is received only when a particular temporal logic formula is satisfied (LTL or CTL formula). However, I cannot find any work about learning which rewards correspond to which temporally extended behavior.

In my searches, I came across k-order MDPs (which are non-Markovian). I did not find RL research done on k-order MDPs.

edited Dec 19 '21 at 18:47

nbro

39,006
12
98
176

asked Apr 07 '19 at 17:45

Gavin Rens

non-Markovian reward functions are quite old concepts. They were first introduced in the theory for control systems. You can take any objective fn. of a control system and apply planning and learning concepts to it. – abunickabhi Apr 07 '19 at 18:11
By "reward functions", do you mean "value functions"? – nbro Apr 08 '19 at 09:36
I don't mean the value function. I am assuming that the environment/society knows the rewards to give, and the agent wants to learn a model of what sequence of states visited results in what rewards. – Gavin Rens Apr 08 '19 at 19:46

What research has been done on learning non-Markovian reward functions?

0 Answers0