4

Are there any algorithms to use reinforcement learning to learn optimal policies in partially observable Markov decision process (POMDP) i.e. when the state is not perfectly observed? More specifically, how does one update the belief state using Bayes' rule when the update Q kernel is not known?

nbro
  • 39,006
  • 12
  • 98
  • 176
  • What do you mean by "update Q kernel"? – nbro Oct 08 '19 at 00:29
  • So a POMDP has a state update kernel Q such that $x_{t+1} \sim Q(\cdot|x_t,a_t)$ and an observation kernel such that the observation $y_t\sim Q^o(\cdot|x_t)$. By update I mean the former, state update kernel – Deepanshu Vasal Jan 10 '21 at 22:26
  • Can you please link me to a research paper or book (or whatever) that uses the word "kernel"? Maybe this is only a terminology issue, but I don't think that "kernel" is the right word here. – nbro Jan 10 '21 at 22:31
  • 1
    Here is a paper [link](http://www.ams.sunysb.edu/~feinberg/public/Feinberg_KZgMOR.pdf). In general you can google pomdp transition kernel to get many such results – Deepanshu Vasal Jan 12 '21 at 00:28
  • I am not sure if this is a duplicate of [this one](https://ai.stackexchange.com/q/11612/2444). I don't think so, as you're also asking a more specific question, but maybe you were just curious about how one could apply RL to POMDPs. Let me know. – nbro Dec 19 '21 at 18:53

0 Answers0