For questions related to episodic tasks in reinforcement learning, i.e. tasks that can be naturally divided into episodes (e.g. multiple games of tic-tac-toe, where each game would be an episode).
Questions tagged [episodic-tasks]
6 questions
3
votes
2 answers
In the on-policy state distribution for episodic tasks, why don't we take into account the length of the episode?
In Sutton & Barto's "Reinforcement Learning: An Introduction", 2nd edition, page 199, they describe the on-policy distribution for episodic tasks in the following box:
I don't understand how this can be done without taking the length of the episode…

user118967
- 208
- 1
- 8
2
votes
4 answers
How can the Cart Pole problem be a continuing task?
In Introduction to Reinforcement Learning (2nd edition) by Sutton and Barto, there is an example of the Pole-Balancing problem (Example 3.4).
In this example, they write that this problem can be treated as an episodic task or continuing task.
I…

user3595632
- 175
- 4
2
votes
1 answer
Is it appropriate to represent 'total failure' as an absorbing state?
My understanding is that, in Markov decision processes, absorbing state are states which can transition only to themselves and that these transitions generate rewards of 0. I know that absorbing states are commonly used to represent goals, so an…

K--
- 121
- 2
1
vote
1 answer
Is it necessary to have a constant reward in the terminal state?
I have downloaded the grid world project form this link. I have executed the project multiple times using:
python gridworld.py -k 20 -a q -r -0.2 -s 90
I have noticed that the reward of the terminal states are changing with time. The grid world at…

AAA
- 111
- 3
1
vote
1 answer
PPO: dealing with variable episodic length
I'm dealing with a project that has episodes of variable length raging from just 3 steps to 20 steps. Now, I'm guessing that this may cause problems with GAE, as actions in large episodes will have much larger advantages than actions in smaller…

Antonis Karvelas
- 65
- 5
0
votes
1 answer
Could Softmax Action Selection be useful to solve an episodic task with more than 100000 possible states and 2000 actions?
I am new in the field of RL. I am trying to use tabular methods, Q-Learning for solving a problem that takes a lot of time for computation, so I would like to know if there are more efficient methods for it.
Why are tabular methods are not useful in…

Aquila
- 33
- 5