For questions about the softmax policy in the context of reinforcement learning and other AI sub-fields.
Questions tagged [softmax-policy]
4 questions
7
votes
1 answer
What happens when you select actions using softmax instead of epsilon greedy in DQN?
I understand the two major branches of RL are Q-Learning and Policy Gradient methods.
From my understanding (correct me if I'm wrong), policy gradient methods have an inherent exploration built-in as it selects actions using a probability…

Linsu Han
- 73
- 4
4
votes
1 answer
Eligibility vector for softmax policy with policy gradients
There is this nice result for policy gradients that the gradient of some performance measure, $\nabla v_{\pi_{\theta}}(s_0)$ (here, in the episodic case for the starting state $s_0$ and policy $\pi$, parametrised by some weights $\theta$) is equal…

Gregor
- 203
- 2
- 9
3
votes
1 answer
What is the difference between the $\epsilon$-greedy and softmax policies?
Could someone explain to me which is the key difference between the $\epsilon$-greedy policy and the softmax policy? In particular, in the contest of SARSA and Q-Learning algorithms. I understood the main difference between these two algorithms, but…

FraMan
- 189
- 2
- 10
2
votes
1 answer
Is a learned policy, for a deterministic problem, trained in a supervised process, a stochastic policy?
If I trained a neural network with 4 outputs (one for each action: move down, up, left, and right) to move an agent through a grid (deterministic problem). The output of the neural network is a probability distribution over the 4 actions, due to the…

Xtalker
- 21
- 2