Questions tagged [softmax-policy]

For questions about the softmax policy in the context of reinforcement learning and other AI sub-fields.

4 questions
7
votes
1 answer

What happens when you select actions using softmax instead of epsilon greedy in DQN?

I understand the two major branches of RL are Q-Learning and Policy Gradient methods. From my understanding (correct me if I'm wrong), policy gradient methods have an inherent exploration built-in as it selects actions using a probability…
4
votes
1 answer

Eligibility vector for softmax policy with policy gradients

There is this nice result for policy gradients that the gradient of some performance measure, $\nabla v_{\pi_{\theta}}(s_0)$ (here, in the episodic case for the starting state $s_0$ and policy $\pi$, parametrised by some weights $\theta$) is equal…
3
votes
1 answer

What is the difference between the $\epsilon$-greedy and softmax policies?

Could someone explain to me which is the key difference between the $\epsilon$-greedy policy and the softmax policy? In particular, in the contest of SARSA and Q-Learning algorithms. I understood the main difference between these two algorithms, but…
2
votes
1 answer

Is a learned policy, for a deterministic problem, trained in a supervised process, a stochastic policy?

If I trained a neural network with 4 outputs (one for each action: move down, up, left, and right) to move an agent through a grid (deterministic problem). The output of the neural network is a probability distribution over the 4 actions, due to the…