7

According to the definition of a fully observable environment in Russell & Norvig, AIMA (2nd ed), pages 41-44, an environment is only fully observable if it requires zero memory for an agent to perform optimally, that is, all relevant information is immediately available from sensing the environment.

From this definition and from the definition of an "episodic" environment in the same book, it is implied that all fully observable environments are, in fact, episodic or can be treated as episodic, which doesn't seem intuitive, but logically follows from the definitions. Also, no stochastic environment can be fully observable, even if the entire state space at a given point in time can be observed because rational action may depend on the previous observation that must be remembered.

Am I wrong?

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

2

No, not all fully observable environments are episodic. Let's take a look again at the definitions from the book:

Fully Observable Environment (section 2.3.2)

If an agent’s sensors give it access to the complete state of the environment at each point in time, then we say that the task environment is fully observable. A task environment is effectively fully observable if the sensors detect all aspects that are relevant to the choice of action

Episodic Environment (section 2.3.2)

In an episodic task environment, the agent’s experience is divided into atomic episodes. In each episode the agent receives a percept and then performs a single action. Crucially, the next episode does not depend on the actions taken in previous episodes.

Take note of the "crucial" part at the end of the definition of episodic environment. A fully observable environment that is not episodic (and therefore sequential in the book's taxonomy) is chess. Chess is fully observable because the player can view the positions of all active pieces on the chess board, and that is all the information that needs to be known in order to take the optimal action. But chess is not episodic, because the player's current move depends on all previous moves, and the current move will have downstream effects in later turns.

In fact, if you look at Figure 2.6 in the book on pg. 45, they provide three examples of fully observable sequential (i.e. not episodic) environments: crossword puzzles, chess, and backgammon. There are of course many more. Most games are sequential as that is the main appeal of them - how to best sequence my moves now in order to ensure victory over my opponent at a future time?

adamconkey
  • 271
  • 1
  • 9
  • I'm not sure your claim that chess is not episodic is correct, even though Figure 2.6 really says that "chess with a clock" is sequential and not episodic. If you consider a single chess game, that could be viewed as a single episode. Another chess game would be another episode, which doesn't really depend on the previous episodes, so chess would be an episodic task. – nbro Jan 24 '21 at 00:41
  • I see your point, I think it depends on what you consider the agent's action space though. In a game of chess the action space is moving a piece and that move will depend on the previous moves within the game. With what you're suggesting, the action space is "play a game of chess", which could make sense if your task is to say learn a strategy that's effective against a particular opponent. But I think that's a different level of abstraction than how you characterize the game of chess itself. At that higher abstraction, nearly all games are episodic. – adamconkey Jan 24 '21 at 02:36
  • When someone says "episodic task", I think of RL. Are you familiar with RL? if I was to formulate chess as an RL problem, how would you define an episode? Wouldn't an episode be a full game of chess until termination? That's why I'm not sure about your conclusion (and that figure 2.6; I actually didn't fully read that section, so I don't exactly why Norvig and Russell that decided to categorize chess as non-episodic.). – nbro Jan 24 '21 at 02:39
  • Yeah based on the definition R&N give I think they don't have the RL sense of episode in mind, because it's about a single percept->action iteration, and they're differentiating whether the action sequence is Markovian or not. – adamconkey Jan 24 '21 at 03:04