0

A stochastic policy means that an agent has probabilities of choosing their available actions, given a state: $\pi(a|s)$.

However in an optimal stochastic policy for a given state, you would assume that there would be a single optimal action that would be assigned a probability of 1 since it yields the highest expected value. Thus, an optimal stochastic policy would become a deterministic policy.

Of course, sometimes there are multiple actions that are all tied for the highest expected value so they are each assigned an equal probability. For example, in the adversarial game of Rocks-Paper-Scissors, each action would be assigned a probability of one third. A deterministic policy in this case would give the adversary a huge advantage.

My question:

Is there an example of a situation/scenario/game where an optimal policy is both stochastic AND assigns a nonzero probability to an action with a lower expected value than the action with the highest expected value?

Edit:

I was thinking maybe poker could be an example, but in what scenario why would I ever allow for an action with a low expected value be available to be chosen?

Maybe a situation where we would have to take the variance in the expected value into account (maybe the state has missing information). For example, maybe action B has a lower expected value that action A, but a higher variance so exploring action B would be useful in reducing uncertainties in the expected values?

Nova
  • 133
  • 4

0 Answers0