Highest Voted 'deterministic-policy' Questions - Artificial Intelligence Stack Exchange

16

votes

3 answers

Is the optimal policy always stochastic if the environment is also stochastic?

Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic? Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…

asked Feb 15 '19 at 13:20

nbro

39,006
12
98
176

8

votes

3 answers

What is the difference between a stochastic and a deterministic policy?

In reinforcement learning, there are the concepts of stochastic (or probabilistic) and deterministic policies. What is the difference between them?

reinforcement-learning comparison policies deterministic-policy stochastic-policy

asked May 12 '19 at 18:50

nbro

39,006
12
98
176

5

votes

1 answer

What is the loss for policy gradients with continuous actions?

I know with policy gradients used in an environment with a discrete action space are updated with $$ \Delta \theta_{t}=\alpha \nabla_{\theta} \log \pi_{\theta}\left(a_{t} \mid s_{t}\right) v_{t} $$ where $v_t$ could be many things that represent how…

neural-networks reinforcement-learning policy-gradients deterministic-policy

asked Sep 30 '20 at 22:12

S2673

560
4
16

3

votes

1 answer

Can Q-learning be used to derive a stochastic policy?

In my understanding, Q-learning gives you a deterministic policy. However, can we use some technique to build a meaningful stochastic policy from the learned Q values? I think that simply using a softmax won't work.

reinforcement-learning q-learning stochastic-policy deterministic-policy

asked Feb 08 '19 at 01:47

Hammer. Wang

153
6

2

votes

1 answer

Is Q-learning only capable of learning a deterministic policy?

I was following a reinforcement learning course on coursera and in this video at 2:57 the instructor says Expected SARSA and SARSA both allow us to learn an optimal $\epsilon$-soft policy, but, Q-learning does not From what I understand, SARSA and…

reinforcement-learning q-learning policies sarsa deterministic-policy

asked May 25 '22 at 18:09

ketan dhanuka

67
4

2

votes

1 answer

Determining to terminate at a reward or not

I am practicing the Bellman equation on Grid world examples and in this scenario, there are numbered grid squares where the agent can choose to terminate and collect the reward equal to the amount inside the numbered square or they can choose to not…

reinforcement-learning deep-rl bellman-equations deterministic-policy stopping-conditions

asked Apr 16 '22 at 08:27

Krellex

145
4

2

votes

1 answer

Is a learned policy, for a deterministic problem, trained in a supervised process, a stochastic policy?

If I trained a neural network with 4 outputs (one for each action: move down, up, left, and right) to move an agent through a grid (deterministic problem). The output of the neural network is a probability distribution over the 4 actions, due to the…

neural-networks policies deterministic-policy stochastic-policy softmax-policy

asked Feb 03 '21 at 12:47

Xtalker

21
2

2

votes

1 answer

Did Alphago zero actually beat Alphago 100 games to 0?

tl;dr Did AlphaGo and AlphaGo play 100 repetitions of the same sequence of boards, or were there 100 different games? Background: Alphago was the first superhuman go player, but it had human tuning and training. AlphaGo zero learned to be more…

alphago-zero alphago deterministic-policy stochastic-policy

asked Oct 21 '20 at 14:33

EngrStudent

361
3
12

2

votes

0 answers

Do we assume the policy to be deterministic when proving the optimality?

In reinforcement learning, when we talk about the principle of optimality, do we assume the policy to be deterministic?

reinforcement-learning proofs policies deterministic-policy

asked Aug 18 '20 at 09:32

hakiki_makato

153
4

2

votes

1 answer

What is the motivation behind using a deterministic policy?

What is the motivation behind using a deterministic policy? Given that the environment is uncertain, it seems stochastic policy makes more sense.

reinforcement-learning deterministic-policy

asked Apr 04 '19 at 17:43

ycenycute

341
1
2
6

0

votes

0 answers

What is an example of an optimal stochastic policy that assigns a nonzero probability to an action with a lower expected value?

A stochastic policy means that an agent has probabilities of choosing their available actions, given a state: $\pi(a|s)$. However in an optimal stochastic policy for a given state, you would assume that there would be a single optimal action that…

reinforcement-learning policies deterministic-policy stochastic-policy

asked Oct 21 '22 at 22:07

Nova

133
4

0

votes

1 answer

How is policy iteration capable of improving on a deterministic policy?

Given a policy $\pi$ and the improved version upon it using policy iteration $\pi'$ we have, for $\forall s \in S$, $v_{\pi'}(s)\geq v_{\pi}(s)$. I think the way we choose $\pi'$ makes it deterministic (unless there is a tie but let's not consider…

reinforcement-learning value-functions policy-iteration optimal-policy deterministic-policy

asked Apr 25 '22 at 19:52

Daviiid

563
3
15

Questions tagged [deterministic-policy]