For questions about the optimality of AI algorithms and solutions.
Questions tagged [optimality]
13 questions
5
votes
2 answers
Given two optimal policies, is an affine combination of them also optimal?
If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy?
Here I…

yang liu
- 53
- 3
3
votes
2 answers
How to make minimax optimal?
By optimal I mean that:
If max has a winning strategy then minimax will return the strategy for max with the fewest number of moves to win.
If min has a winning strategy then minimax will return the strategy for max with the most number of moves to…

Aadit M Shah
- 64
- 1
- 10
3
votes
1 answer
Is there an error in A* optimality proof Russel-Norvig 4th edition?
In "AI: A Modern Approach", 4th edition, by Russell and Norvig, they give a purported proof that A* is cost-optimal for any admissible heuristic. The given proof seems most certainly wrong. They want to show that all nodes on the optimal path are…

vdbuss
- 81
- 3
3
votes
1 answer
How is $v_*(s) = \max_{\pi} v_\pi(s)$ also applicable in the case of stochastic policies?
I am reading Sutton & Bartos's Book "Introduction to reinforcement learning". In this book, the defined the optimal value function as:
$$v_*(s) = \max_{\pi} v_\pi(s),$$ for all $s \in \mathcal{S}$.
Do we take the max over all deterministic policies,…

Tamar
- 33
- 3
3
votes
2 answers
If uniform cost search is used for bidirectional search, is it guaranteed the solution is optimal?
If uniform cost search is used for both the forward and backward search in bidirectional search, is it guaranteed the solution is optimal?
user42125
2
votes
1 answer
Can we achieve optimality with minimax using an evaluation function?
The following quote (from AIMA) refers to the situation in which the minimax algorithm computes its values directly from the terminal states.
(The) definition of optimal play for MAX assumes that MIN also plays optimally—it maximizes the worst-case…

Kestina
- 21
- 1
1
vote
1 answer
How are these two equations for the optimal state-value function equivalent?
By substituting the optimal policy $\pi_{\star}$ into the Bellman equation, we get the Bellman equation for $v_{\pi_{\star}}(s)=v_{\star}(s)$:
$$ v_{\star}(s) = \sum\limits_a \pi_{\star}(a|s) \sum\limits_{s'} \sum_r p(s', r | s, a)[r + \gamma…

DSPinfinity
- 301
- 1
- 8
1
vote
1 answer
Are optimal policies always deterministic, or can there also be optimal policies that are stochastic?
Let $M$ be an MDP with two states, $A$ and $B$, where $A$ is the starting state, and you always transit to the final state $B$ using two possible actions. $A_1$ gives you rewards that are normally distributed $\mathcal{N}(0, 1)$ and $A_2$ gives you…

Abc1729
- 45
- 3
1
vote
0 answers
Minimax algorithm with only partial visibility
I'm trying to implement the minimax algorithm with alpha beta pruning on a game that works like this:
Player 1 plays (x1, y1).
Player 2 can only see the x-value (x1) that Player 1 played (and not the y-value y1). Player 2 plays (x2, y2).
An…

SpiderManCity
- 11
- 1
1
vote
2 answers
Why is the optimal policy for an infinite horizon MDP deterministic?
Could someone please help me gain some intuition as to why the optimal policy for a Markov Decision Process in the infinite horizon case (agent acts forever) is deterministic?

stoic-santiago
- 1,121
- 5
- 18
0
votes
1 answer
Can A* be non-optimal if it uses an admissible but inconsistent heuristic with graph search?
The book "Artificial Intelligence: A Modern Approach" (4th edition, global version) says
"With an admissible heuristic, A* is cost-optimal...".
An admissible heuristic is one that never overestimates the distance to the goal, while a consistent…

numq
- 1
- 1
0
votes
1 answer
What is the equation for $\pi_*$ in terms of $q_*(s,a)$?
I am trying to solve the following exercise from Sutton and Barto:
Sutton and Barto Exercise 3.27 Give an equation for $\pi_*$ in terms of $q_*(s,a)$
However, I am struggling to do so. I know that $\pi_*$ is the policy which will pick the action…

user
- 145
- 9
0
votes
1 answer
Are hill climbing variations always optimal and complete?
Are hill climbing variations (like steepest ascent hill climbing, stochastic hill climbing, random restart hill climbing, local beam search) always optimal and complete?
user45792