Questions tagged [deterministic-policy]

For questions related to the concept of a "deterministic policy" (as defined in reinforcement learning).

In reinforcement learning, a deterministic policy is a function from a state to a single action.

12 questions
16
votes
3 answers

Is the optimal policy always stochastic if the environment is also stochastic?

Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic? Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…
8
votes
3 answers

What is the difference between a stochastic and a deterministic policy?

In reinforcement learning, there are the concepts of stochastic (or probabilistic) and deterministic policies. What is the difference between them?
5
votes
1 answer

What is the loss for policy gradients with continuous actions?

I know with policy gradients used in an environment with a discrete action space are updated with $$ \Delta \theta_{t}=\alpha \nabla_{\theta} \log \pi_{\theta}\left(a_{t} \mid s_{t}\right) v_{t} $$ where $v_t$ could be many things that represent how…
3
votes
1 answer

Can Q-learning be used to derive a stochastic policy?

In my understanding, Q-learning gives you a deterministic policy. However, can we use some technique to build a meaningful stochastic policy from the learned Q values? I think that simply using a softmax won't work.
2
votes
1 answer

Is Q-learning only capable of learning a deterministic policy?

I was following a reinforcement learning course on coursera and in this video at 2:57 the instructor says Expected SARSA and SARSA both allow us to learn an optimal $\epsilon$-soft policy, but, Q-learning does not From what I understand, SARSA and…
2
votes
1 answer

Determining to terminate at a reward or not

I am practicing the Bellman equation on Grid world examples and in this scenario, there are numbered grid squares where the agent can choose to terminate and collect the reward equal to the amount inside the numbered square or they can choose to not…
2
votes
1 answer

Is a learned policy, for a deterministic problem, trained in a supervised process, a stochastic policy?

If I trained a neural network with 4 outputs (one for each action: move down, up, left, and right) to move an agent through a grid (deterministic problem). The output of the neural network is a probability distribution over the 4 actions, due to the…
2
votes
1 answer

Did Alphago zero actually beat Alphago 100 games to 0?

tl;dr Did AlphaGo and AlphaGo play 100 repetitions of the same sequence of boards, or were there 100 different games? Background: Alphago was the first superhuman go player, but it had human tuning and training. AlphaGo zero learned to be more…
2
votes
0 answers

Do we assume the policy to be deterministic when proving the optimality?

In reinforcement learning, when we talk about the principle of optimality, do we assume the policy to be deterministic?
2
votes
1 answer

What is the motivation behind using a deterministic policy?

What is the motivation behind using a deterministic policy? Given that the environment is uncertain, it seems stochastic policy makes more sense.
ycenycute
  • 341
  • 1
  • 2
  • 6
0
votes
0 answers

What is an example of an *optimal* stochastic policy that assigns a nonzero probability to an action with a lower expected value?

A stochastic policy means that an agent has probabilities of choosing their available actions, given a state: $\pi(a|s)$. However in an optimal stochastic policy for a given state, you would assume that there would be a single optimal action that…
0
votes
1 answer

How is policy iteration capable of improving on a deterministic policy?

Given a policy $\pi$ and the improved version upon it using policy iteration $\pi'$ we have, for $\forall s \in S$, $v_{\pi'}(s)\geq v_{\pi}(s)$. I think the way we choose $\pi'$ makes it deterministic (unless there is a tie but let's not consider…