Highest Voted 'optimality' Questions - Artificial Intelligence Stack Exchange

5

votes

2 answers

Given two optimal policies, is an affine combination of them also optimal?

If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy? Here I…

asked Nov 18 '20 at 07:04

yang liu

53
3

3

votes

2 answers

How to make minimax optimal?

By optimal I mean that: If max has a winning strategy then minimax will return the strategy for max with the fewest number of moves to win. If min has a winning strategy then minimax will return the strategy for max with the most number of moves to…

minimax game-theory optimality

asked Sep 16 '17 at 07:53

Aadit M Shah

64
1
10

3

votes

1 answer

Is there an error in A* optimality proof Russel-Norvig 4th edition?

In "AI: A Modern Approach", 4th edition, by Russell and Norvig, they give a purported proof that A* is cost-optimal for any admissible heuristic. The given proof seems most certainly wrong. They want to show that all nodes on the optimal path are…

search proofs a-star norvig-russell optimality

asked Feb 04 '22 at 20:44

vdbuss

81
3

3

votes

1 answer

How is $v_*(s) = \max_{\pi} v_\pi(s)$ also applicable in the case of stochastic policies?

I am reading Sutton & Bartos's Book "Introduction to reinforcement learning". In this book, the defined the optimal value function as: $$v_*(s) = \max_{\pi} v_\pi(s),$$ for all $s \in \mathcal{S}$. Do we take the max over all deterministic policies,…

markov-decision-process value-functions stochastic-policy optimal-policy optimality

asked Mar 26 '21 at 08:15

Tamar

33
3

3

votes

2 answers

If uniform cost search is used for bidirectional search, is it guaranteed the solution is optimal?

If uniform cost search is used for both the forward and backward search in bidirectional search, is it guaranteed the solution is optimal?

search proofs uniform-cost-search bidirectional-search optimality

asked Nov 08 '20 at 10:49

user42125

2

votes

1 answer

Can we achieve optimality with minimax using an evaluation function?

The following quote (from AIMA) refers to the situation in which the minimax algorithm computes its values directly from the terminal states. (The) definition of optimal play for MAX assumes that MIN also plays optimally—it maximizes the worst-case…

minimax norvig-russell evaluation-functions optimality

asked Jun 11 '22 at 13:35

Kestina

21
1

1

vote

1 answer

How are these two equations for the optimal state-value function equivalent?

By substituting the optimal policy $\pi_{\star}$ into the Bellman equation, we get the Bellman equation for $v_{\pi_{\star}}(s)=v_{\star}(s)$: $$ v_{\star}(s) = \sum\limits_a \pi_{\star}(a|s) \sum\limits_{s'} \sum_r p(s', r | s, a)[r + \gamma…

reinforcement-learning comparison value-functions bellman-equations optimality

asked Jan 22 '23 at 01:12

DSPinfinity

301
1
8

1

vote

1 answer

Are optimal policies always deterministic, or can there also be optimal policies that are stochastic?

Let $M$ be an MDP with two states, $A$ and $B$, where $A$ is the starting state, and you always transit to the final state $B$ using two possible actions. $A_1$ gives you rewards that are normally distributed $\mathcal{N}(0, 1)$ and $A_2$ gives you…

markov-decision-process policies optimality

asked May 01 '21 at 05:19

Abc1729

45
3

1

vote

0 answers

Minimax algorithm with only partial visibility

I'm trying to implement the minimax algorithm with alpha beta pruning on a game that works like this: Player 1 plays (x1, y1). Player 2 can only see the x-value (x1) that Player 1 played (and not the y-value y1). Player 2 plays (x2, y2). An…

algorithm minimax game-theory alpha-beta-pruning optimality

asked Nov 25 '20 at 03:58

SpiderManCity

11
1

1

vote

2 answers

Why is the optimal policy for an infinite horizon MDP deterministic?

Could someone please help me gain some intuition as to why the optimal policy for a Markov Decision Process in the infinite horizon case (agent acts forever) is deterministic?

reinforcement-learning markov-decision-process policies optimal-policy optimality

asked Aug 06 '20 at 07:42

stoic-santiago

1,121
5
18

0

votes

1 answer

Can A* be non-optimal if it uses an admissible but inconsistent heuristic with graph search?

The book "Artificial Intelligence: A Modern Approach" (4th edition, global version) says "With an admissible heuristic, A* is cost-optimal...". An admissible heuristic is one that never overestimates the distance to the goal, while a consistent…

search norvig-russell admissible-heuristic optimality consistent-heuristic

asked Jan 29 '23 at 03:11

numq

1
1

0

votes

1 answer

What is the equation for $\pi_$ in terms of $q_(s,a)$?

I am trying to solve the following exercise from Sutton and Barto: Sutton and Barto Exercise 3.27 Give an equation for $\pi_*$ in terms of $q_*(s,a)$ However, I am struggling to do so. I know that $\pi_*$ is the policy which will pick the action…

reinforcement-learning sutton-barto policies notation optimality

asked Jul 05 '22 at 13:39

user

145
9

0

votes

1 answer

Are hill climbing variations always optimal and complete?

Are hill climbing variations (like steepest ascent hill climbing, stochastic hill climbing, random restart hill climbing, local beam search) always optimal and complete?

search proofs hill-climbing optimality completeness

asked Mar 29 '21 at 12:05

user45792

Questions tagged [optimality]