Highest Voted 'pomdp' Questions - Artificial Intelligence Stack Exchange

12

votes

2 answers

Is there a fundamental difference between an environment being stochastic and being partially observable?

In AI literature, deterministic vs stochastic and being fully-observable vs partially observable are usually considered two distinct properties of the environment. I'm confused about this because what appears random can be described by hidden…

asked Dec 24 '21 at 01:31

martinkunev

233
1
7

10

votes

1 answer

Can Q-learning be used in a POMDP?

Can Q-learning (and SARSA) be directly used in a Partially Observable Markov Decision Process (POMDP)? If not, why not? My intuition is that the policies learned will be terrible because of partial observability. Are there ways to transform these…

reinforcement-learning q-learning pomdp markov-decision-process sarsa

asked Apr 03 '19 at 02:40

drerD

298
2
6

6

votes

3 answers

What exactly are partially observable environments?

I have trouble understanding the meaning of partially observable environments. Here's my doubt. According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…

reinforcement-learning definitions environment state-spaces pomdp

asked May 22 '21 at 07:39

CHANDRASEKHAR HETHA HAVYA

63
5

6

votes

0 answers

How exactly does self-play work, and how does it relate to MCTS?

I am working towards using RL to create an AI for a two-player, hidden-information, a turn-based board game. I have just finished David Silver's RL course and Denny Britz's coding exercises, and so am relatively familiar with MC control, SARSA,…

reinforcement-learning definitions monte-carlo-tree-search pomdp self-play

asked Feb 09 '20 at 13:05

Alienator

61
2

6

votes

2 answers

Are perfect and imperfect information games modelled as fully and partially observable environments, respectively?

In perfect information games, the agent can see all the moves performed in the past. Besides, it can observe the next action that will be put into practice by the opponent. In this case, can we say that perfect information games are actually a…

comparison markov-decision-process game-theory pomdp imperfect-information

asked Nov 09 '19 at 09:14

Goktug

161
2

5

votes

1 answer

What could happen if we wrongly assume that the POMDP is an MDP?

Consider the Breakout environment. We know that the underlying world behaves like an MDP, because, for the evolution of the system, it just needs to know what the current state (i.e. position, speed, and speed direction of the ball, positions of the…

reinforcement-learning markov-decision-process pomdp

asked Jun 17 '18 at 09:33

Marco Favorito

185
7

4

votes

1 answer

Are multi agent or self-play environments always automatically POMDPs?

As part of my thesis, I'm working on a zero sum game with RL to train an agent. The game is a real-time game, a derivation of pong, one could imagine playing pong with both sides being foosball rods. As I see it, this is an MDP with perfect…

reinforcement-learning markov-decision-process pomdp self-play observation-spaces

asked Jan 16 '22 at 16:45

kitaird

115
5

4

votes

0 answers

How to update the observation probabilities in a POMDP?

How can I update the observation probability for a POMDP (or HMM), in order to have a more accurate prediction model? The POMDP relies on observation probabilities that match an observation to a state. This poses an issue as the probabilities are…

reinforcement-learning pomdp state-spaces conditional-probability

asked Jan 07 '21 at 07:59

Pluxyy

85
3

4

votes

1 answer

Why is tic-tac-toe considered a non-deterministic environment?

I have been reading about deterministic and stochastic environments, when I came up with an article that states that tic-tac-toe is a non-deterministic environment. But why is that? An action will lead to a known state of the game and an agent has…

game-ai markov-decision-process game-theory pomdp tic-tac-toe

asked Aug 02 '20 at 22:40

EEAH

193
1
5

4

votes

0 answers

Is there a way to do reinforcement learning in POMDP?

Are there any algorithms to use reinforcement learning to learn optimal policies in partially observable Markov decision process (POMDP) i.e. when the state is not perfectly observed? More specifically, how does one update the belief state using…

reinforcement-learning reference-request pomdp

asked Oct 06 '19 at 17:09

Deepanshu Vasal

41
1

3

votes

0 answers

Can we use a Gaussian process to approximate the belief distribution at every instant in a POMDP?

Suppose $x_{t+1} \sim \mathbb{P}(\cdot | x_t, a_t)$ denotes the state transition dynamics in a reinforcement learning (RL) problem. Let $y_{t+1} = \mathbb{P}(\cdot | x_{t+1})$ denote the noisy observation or the imperfect state information. Let…

reinforcement-learning markov-decision-process pomdp bayesian-optimization gaussian-process

asked May 12 '20 at 02:21

math_phile

56
2

3

votes

0 answers

Is Monte Carlo tree search needed in partially observable environments during gameplay?

I understand that with a fully observable environment (chess / go etc) you can run an MCTS with an optimal policy network for future planning purposes. This will allow you to pick actions for gameplay, which will result in max expected return from…

reinforcement-learning monte-carlo-tree-search alphazero pomdp

asked Apr 22 '20 at 12:14

Yohahn Ribeiro

31
1

3

votes

1 answer

What is the intuition behind grid-based solutions to POMDPs?

After spending some time reading about POMDP, I'm still having a hard time understanding how grid-based solutions work. I understand the finite horizon brute-force solution, where you have your current belief distribution, enumerate every possible…

markov-decision-process pomdp

asked Apr 05 '20 at 01:41

FourierFlux

783
1
4
14

3

votes

1 answer

Is it possible for value-based methods to learn stochastic policies?

Is it possible for value-based methods to learn stochastic policies? I'm trying to get a clear picture of the different categories for RL algorithms, and while doing so I started to think about settings where the optimal policy is stochastic…

reinforcement-learning value-functions pomdp stochastic-policy value-based-methods

asked Oct 24 '19 at 09:30

Krrrl

211
1
10

2

votes

1 answer

Is my understanding of the differences between MDP, Semi MDP and POMDP correct?

I just wanted to confirm that my understanding of the different Markov Decision Processes are correct, because they are the fundamentals of reinforcement learning. Also, I read a few literature sources, and some are not consistent with each other.…

reinforcement-learning comparison markov-decision-process pomdp semi-mdp

asked Oct 29 '18 at 21:33

Rui Nian

423
3
13

Questions tagged [pomdp]