2

I'm trying to implement a research paper, as explained in this other post, here the author of the paper assumed R as a function of both states and actions, while the code (and the MDP) I'm using to test this algorithm assumes R as a function of only states.

My question is:

Given $\mathcal{X}$ as the set of states of an MDP and $\mathcal{A}$ as the set of actions of an MDP. Supposing I have four states ($1$,$2$,$3$,$4$), two actions $a$ and $b$ and a reward function $R: \mathcal{X}\to\mathbb{R}$ s.t.

$R(1) = 0$

$R(2) = 0$

$R(3) = 0$

$R(4) = 1$

If I need to change the current reward function to a new reward function $R:\mathcal{X}\ \times \mathcal{A} \to\mathbb{R}$ is it ok to compute it as $\forall a,R(s,a) = R(s)$?

$R(1,a) = 0$

$R(1,b) = 0$

$R(2,a) = 0$

$R(2,b) = 0$

$R(3,a) = 0$

$R(3,b) = 0$

$R(4,a) = 1$

$R(4,b) = 1$

More generally, what's the correct way of generalising a reward function $R: \mathcal{X}\to\mathbb{R}$ to a reward function $R:\mathcal{X}\ \times \mathcal{A} \to\mathbb{R}$?

1 Answers1

0

As explained here, I can write $R(s,a) = R(s)\ \forall a$ since the reward of my specific MDP is dependent exclusively to the state $s$.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • Just a clarification, if you're trying to reproduce a certain paper that uses $R(s, a)$, why do you say that in your MDP the reward function only depends on $s$? I suppose you're testing their approach in a simpler environment. Right? Is the code you're using available? – nbro Jan 31 '21 at 19:04
  • Yes exactly, since the paper doesn't present an example environment, I'm using the one from [here](https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-416.pdf), the code isn't available yet (I couldn't find any implementation online so I'm writing my own) – ИванКарамазов Feb 01 '21 at 10:13