Highest Voted 'sarsa' Questions - Artificial Intelligence Stack Exchange

11

votes

1 answer

Are Q-learning and SARSA the same when action selection is greedy?

I'm currently studying reinforcement learning and I'm having difficulties with question 6.12 in Sutton and Barto's book. Suppose action selection is greedy. Is Q-learning then exactly the same algorithm as SARSA? Will they make exactly the same…

asked May 10 '20 at 10:52

hyuj

131
4

10

votes

1 answer

Can Q-learning be used in a POMDP?

Can Q-learning (and SARSA) be directly used in a Partially Observable Markov Decision Process (POMDP)? If not, why not? My intuition is that the policies learned will be terrible because of partial observability. Are there ways to transform these…

reinforcement-learning q-learning pomdp markov-decision-process sarsa

asked Apr 03 '19 at 02:40

drerD

298
2
6

8

votes

2 answers

How should I handle action selection in the terminal state when implementing SARSA?

I recently started learning about reinforcement learning. Currently, I am trying to implement the SARSA algorithm. However, I do not know how to deal with $Q(s', a')$, when $s'$ is the terminal state. First, there is no action to choose from in this…

reinforcement-learning implementation sarsa

asked Aug 04 '17 at 13:45

Hai Nguyen

552
4
14

6

votes

1 answer

Is Expected SARSA an off-policy or on-policy algorithm?

I understand that SARSA is an On-policy algorithm, and Q-learning an off-policy one. Sutton and Barto's textbook describes Expected Sarsa thusly: In these cliff walking results Expected Sarsa was used on-policy, but in general it might use a…

reinforcement-learning off-policy-methods sarsa on-policy-methods expected-sarsa

asked Apr 20 '20 at 18:37

Y. Xu

63
1
4

5

votes

1 answer

Understanding the n-step off-policy SARSA update

In Sutton & Barto's book (2nd ed) page 149, there is the equation 7.11 I am having a hard time understanding this equation. I would have thought that we should be moving $Q$ towards $G$, where $G$ would be corrected by importance sampling, but only…

reinforcement-learning sutton-barto off-policy-methods temporal-difference-methods sarsa

asked Apr 05 '19 at 14:23

Antoine Savine

153
4

5

votes

1 answer

Expected SARSA vs SARSA in "RL: An Introduction"

Sutton and Barto state in the 2018-version of "Reinforcement Learning: An Introduction" in the context of Expected SARSA (p. 133) the following sentences: Expected SARSA is more complex computationally than Sarsa but, in return, it eliminates the…

reinforcement-learning sutton-barto sarsa expected-sarsa

asked Feb 21 '19 at 19:55

F.M.F.

311
3
7

4

votes

2 answers

Is the optimal policy the one with the highest accumulative reward (Q-Learning vs SARSA)?

I was looking at the following diagram, The reward obtained with SARSA is higher. However, the path that Q learning chooses is eventually the optimal one, isn't it? Why is the SARSA reward higher if it is not choosing the best path? shouldn't the…

reinforcement-learning q-learning sarsa

asked Jan 06 '22 at 14:38

Pulse9

282
1
7

4

votes

1 answer

How should I generate datasets for a SARSA agent when the environment is not simple?

I am currently working on my master's thesis and going to apply Deep-SARSA as my DRL algorithm. The problem is that there is no datasets available and I guess that I should generate them somehow. Datasets generation seems a common feature in this…

reinforcement-learning datasets environment sarsa on-policy-methods

asked Jan 06 '21 at 07:26

Shahin

153
4

4

votes

1 answer

When do SARSA and Q-Learning converge to optimal Q values?

Here's another interesting multiple-choice question that puzzles me a bit. In tabular MDPs, if using a decision policy that visits all states an infinite number of times, and in each state, randomly selects an action, then: Q-learning will…

reinforcement-learning q-learning convergence sarsa

asked Aug 09 '20 at 15:35

stoic-santiago

1,121
5
18

3

votes

1 answer

Can we also estimate $V_{\pi}$ with SARSA?

For SARSA, I know we can estimate the action value $Q(s,a)$, and the relationship between $V(s)$ and $Q(s,a)$ is $V_{\pi}(s) = \sum_{a \in \mathcal{A}} \pi(a|s)Q_{\pi} (s,a)$. So my question is, can we simply estimate $V_{\pi}$ by applying the above…

reinforcement-learning value-functions bellman-equations sarsa

asked May 20 '22 at 04:23

Dingzhi Hu

31
1

3

votes

1 answer

When does backward propagation occur in n-step SARSA?

I am trying to understand the algorithm for n-step SARSA from Sutton and Barto (2nd Edition). As I understand it, this algorithm should update n state-action values, but I cannot see where it is propagated backward. Can someone explain to me how…

reinforcement-learning sutton-barto sarsa

asked Feb 15 '17 at 15:39

nehalem

131
2

3

votes

1 answer

How to determine if Q-learning has converged in practice？

I am using Q-learning and SARSA to solve a problem. The agent learns to go from the start to the goal without falling in the holes. At each state, I can choose the action corresponding to the maximum Q value at the state (the greedy action that the…

reinforcement-learning q-learning convergence temporal-difference-methods sarsa

asked Oct 14 '20 at 10:32

WANGWANGZI

31
4

3

votes

1 answer

Can the agent wait until the end of the episode to determine the reward in SARSA?

From Sutton and Barto's book Reinforcement Learning (Adaptive Computation and Machine Learning series) (p. 99), the following definition for first-visit MC prediction, for estimating $V \sim V_\pi$ is given: Is determining the reward for each…

reinforcement-learning definitions rewards sarsa

asked Jun 01 '20 at 18:49

blue-sky

325
1
11

3

votes

0 answers

Evaluation a policy learned using Q - learning

I have been reading literature on reinforcement learning in healthcare. I am slightly confused between the policy evaluation for both SARSA and Q-learning. To my knowledge, I believe that SARSA is used for policy evaluation, to find the Q values of…

reinforcement-learning q-learning healthcare sarsa

asked Feb 15 '20 at 14:12

calveeen

1,251
7
17

3

votes

1 answer

What is the difference between the $\epsilon$-greedy and softmax policies?

Could someone explain to me which is the key difference between the $\epsilon$-greedy policy and the softmax policy? In particular, in the contest of SARSA and Q-Learning algorithms. I understood the main difference between these two algorithms, but…

reinforcement-learning q-learning sarsa epsilon-greedy-policy softmax-policy

asked Jan 21 '20 at 20:39

FraMan

189
2
10

Questions tagged [sarsa]