For questions related to the concept of Partially Observable Markov Decision Process (POMDP), which is a generalization of the Markov Decision Process (MDP) to the cases where information about the states is incomplete (or partially observable).
Questions tagged [pomdp]
37 questions
12
votes
2 answers
Is there a fundamental difference between an environment being stochastic and being partially observable?
In AI literature, deterministic vs stochastic and being fully-observable vs partially observable are usually considered two distinct properties of the environment.
I'm confused about this because what appears random can be described by hidden…

martinkunev
- 233
- 1
- 7
10
votes
1 answer
Can Q-learning be used in a POMDP?
Can Q-learning (and SARSA) be directly used in a Partially Observable Markov Decision Process (POMDP)? If not, why not? My intuition is that the policies learned will be terrible because of partial observability. Are there ways to transform these…

drerD
- 298
- 2
- 6
6
votes
3 answers
What exactly are partially observable environments?
I have trouble understanding the meaning of partially observable environments. Here's my doubt.
According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…
6
votes
0 answers
How exactly does self-play work, and how does it relate to MCTS?
I am working towards using RL to create an AI for a two-player, hidden-information, a turn-based board game. I have just finished David Silver's RL course and Denny Britz's coding exercises, and so am relatively familiar with MC control, SARSA,…

Alienator
- 61
- 2
6
votes
2 answers
Are perfect and imperfect information games modelled as fully and partially observable environments, respectively?
In perfect information games, the agent can see all the moves performed in the past. Besides, it can observe the next action that will be put into practice by the opponent.
In this case, can we say that perfect information games are actually a…

Goktug
- 161
- 2
5
votes
1 answer
What could happen if we wrongly assume that the POMDP is an MDP?
Consider the Breakout environment.
We know that the underlying world behaves like an MDP, because, for the evolution of the system, it just needs to know what the current state (i.e. position, speed, and speed direction of the ball, positions of the…

Marco Favorito
- 185
- 7
4
votes
1 answer
Are multi agent or self-play environments always automatically POMDPs?
As part of my thesis, I'm working on a zero sum game with RL to train an agent.
The game is a real-time game, a derivation of pong, one could imagine playing pong with both sides being foosball rods.
As I see it, this is an MDP with perfect…

kitaird
- 115
- 5
4
votes
0 answers
How to update the observation probabilities in a POMDP?
How can I update the observation probability for a POMDP (or HMM), in order to have a more accurate prediction model?
The POMDP relies on observation probabilities that match an observation to a state. This poses an issue as the probabilities are…

Pluxyy
- 85
- 3
4
votes
1 answer
Why is tic-tac-toe considered a non-deterministic environment?
I have been reading about deterministic and stochastic environments, when I came up with an article that states that tic-tac-toe is a non-deterministic environment.
But why is that?
An action will lead to a known state of the game and an agent has…

EEAH
- 193
- 1
- 5
4
votes
0 answers
Is there a way to do reinforcement learning in POMDP?
Are there any algorithms to use reinforcement learning to learn optimal policies in partially observable Markov decision process (POMDP) i.e. when the state is not perfectly observed? More specifically, how does one update the belief state using…

Deepanshu Vasal
- 41
- 1
3
votes
0 answers
Can we use a Gaussian process to approximate the belief distribution at every instant in a POMDP?
Suppose $x_{t+1} \sim \mathbb{P}(\cdot | x_t, a_t)$ denotes the state transition dynamics in a reinforcement learning (RL) problem. Let $y_{t+1} = \mathbb{P}(\cdot | x_{t+1})$ denote the noisy observation or the imperfect state information. Let…

math_phile
- 56
- 2
3
votes
0 answers
Is Monte Carlo tree search needed in partially observable environments during gameplay?
I understand that with a fully observable environment (chess / go etc) you can run an MCTS with an optimal policy network for future planning purposes. This will allow you to pick actions for gameplay, which will result in max expected return from…

Yohahn Ribeiro
- 31
- 1
3
votes
1 answer
What is the intuition behind grid-based solutions to POMDPs?
After spending some time reading about POMDP, I'm still having a hard time understanding how grid-based solutions work.
I understand the finite horizon brute-force solution, where you have your current belief distribution, enumerate every possible…

FourierFlux
- 783
- 1
- 4
- 14
3
votes
1 answer
Is it possible for value-based methods to learn stochastic policies?
Is it possible for value-based methods to learn stochastic policies? I'm trying to get a clear picture of the different categories for RL algorithms, and while doing so I started to think about settings where the optimal policy is stochastic…

Krrrl
- 211
- 1
- 10
2
votes
1 answer
Is my understanding of the differences between MDP, Semi MDP and POMDP correct?
I just wanted to confirm that my understanding of the different Markov Decision Processes are correct, because they are the fundamentals of reinforcement learning. Also, I read a few literature sources, and some are not consistent with each other.…

Rui Nian
- 423
- 3
- 13