0

I am learning Reinforcement learning for games following Gridworld examples. Apologies in advance if this is a basic question, very new to reinforcement learning.

I am slightly confused in scenarios where probability of moving up, down, left and right are not provided or stated. In this scenario, I assume we assume the optimal policy and therefore, you would apply the Bellman equation as:

$V(s) = max_a(R(s,a)+\gamma V(s'))$

Cost for any movement is 0 and an agent can choose to terminate at a numbered grid to collect a reward amount of the grid number. This is why my square closest to the reward takes in the value 8 since it will terminate with the action to the next state to collect the reward.

Would this be the correct way to determine the value for the surrounding grid squares? enter image description here

Krellex
  • 145
  • 4
  • Could you please put your **specific question** in the title? "Determine Gridworld values" is not a question and it's also not specific. Thank you. – nbro Apr 14 '22 at 21:51
  • Why is "In this scenario, I assume we assume the optimal policy" true? Why would you assume the optimal policy you don't know the transition dynamics? That doesn't make sense to me. Anyway, are you asking if that equation is the right one to update the values? – nbro Apr 20 '22 at 12:49

0 Answers0