0

I found some literature regarding the design of action-spaces and that e.g. a discretization of continuous actions in video-game environments can be crucial for successful learning (Action Space Shaping in Deep Reinforcement Learning, 2020).

However, I didn’t find anything discussing this topic concerning observations. I’m not talking about normalization of observations, as NN are mainly used in DRL as function approximators and they perform better with normalized inputs.

E.g. continuous observations, which might be discretized into ordinal or categorical values. Imagine having a domain where the agent can close or open a gate, and it should learn whether the next object should pass or not. The agent has also to learn how to close the gate (with actions being "reduce or increase gate width", so the agent has to pick one action multiple times in a row to reach some certain width). Now one can design the environment to provide a continuous value for the gate width. Or discretize this and provide flags that indicate the gate is open enough or not. Are there references that discuss this topic?

The reason I’m thinking about this is because considering a policy as a mapping of actions to states, discretizing the observations will eventually lead to less mapping that has to be learned. Surely this can come at the cost of a lower peek performance, but I could also imagine higher training speed.

Is my thinking correctly concerning that? Or could discretizing the observations harm learning in other ways?

kitaird
  • 115
  • 5

1 Answers1

1

I believe that discretizing the action/state space when using function approximators like NN is only acceptable when losing information is acceptable. Why would you discretize an observation, for example, when the precise value of a continuous feature is important for making a decision? Imagine, for example, control scenarios, one of the fields that fit the decision-making characteristics of MDPs... if there is a system that produces similar results for decisions based on a range of values such as a sensor that measures temperature it could be plausible to discretize it to reduce the observation space. Contrary, if the system needs the accuracy of some values to make decisions, it might be not a good idea. I saw works tackling this mostly backing in the past for tabular RL, where this was a necessity due to the curse of dimensionality. Nowadays, I mainly see approaches to deal with this more efficiently, such as actor-critic methods (continuous action scenarios). In short, the discretization depends on your specific domain, but the idea is to give enough information for the agent to take the best action. You may discretize your observation space which will then require a lighter architecture to learn at the cost of a less efficient policy. On the other hand, you may be adding irrelevant information to the agent with non-discretized features that do not highlight the important caracteristics of the environment. Representation is a tough task in RL!

HenDoNR
  • 81
  • 4
  • I thought about this too and was wondering if there are more specific approaches for environments that are very poor on training data. I thought that reducing continuous observations into more discrete ones could ease learning, but I was also concerned that the agent gets less specific feedback from the environment with discretized observations. Consider the agent has to learn a sequence of actions to reach a state change. If only one action isn't performed, the state change is not happening and that might be seen as an equally wrong behaviour as a random action sequence by the policy. – kitaird Jan 24 '22 at 15:19