1

Why can an AI, like AlphaStar, work in StarCraft, although the environment is only partially observable? As far as I know, there are no theoretical results on RL in the POMDP environment, but it appears the core RL techniques are being used in partially observable domains.

nbro
  • 39,006
  • 12
  • 98
  • 176
FourierFlux
  • 783
  • 1
  • 4
  • 14
  • See this related but more specific question: [Can Q-learning be used in a POMDP?](https://ai.stackexchange.com/q/11612/2444). – nbro Apr 03 '20 at 20:30
  • It's not even clear if SC2 can be represented as any type of markov model, observable or not but the system still works. – FourierFlux Apr 03 '20 at 20:35
  • I am not familiar with AlphaStar or StarCraft to explain the success of AlphaStar, but this is not the first time that a certain assumption doesn't probably hold in the real-world problem, but the applied method (that assumes that such an assumption holds) still performs decently or even well. For example, if I recall correctly, naive Bayes makes some assumptions that are just unrealistic, but, in practice, it still works in many cases. Why does it work? I don't know because I am not an expert on naive Bayes and the problems it's been applied to. – nbro Apr 03 '20 at 20:39
  • However, I can say that, in many cases, people try to make the environment an MDP (or they try to make the Markov property hold) by doing some tricks. For example, a typical trick in RL is to combine different successive frames of a video (or video game) in order to build a state, rather than using only one frame as the state. – nbro Apr 03 '20 at 20:40
  • Also, note that POMDPs are not exactly what you are thinking of. In POMDP, the agent doesn't know the state it is in, so it maintains a "belief" (i.e. a probability distribution) over the possible states. But your question is still very interesting and it is definitely legitimate! – nbro Apr 03 '20 at 20:52
  • Practically speaking, the agent doesn't know its state. The true state of the environment would display every piece but only areas within a radius of its units can it see. Based upon prior observations it could develop a belief about possible unit positions..... – FourierFlux Apr 03 '20 at 21:04
  • Yes, maybe you're right. Even if you combine multiple frames, that may still not represent a state. It could just be an approximation of the true state. – nbro Apr 03 '20 at 21:07

0 Answers0