Should I apply normalization to the observations in deep reinforcement learning?

Question

I am new to DRL and trying to implement my custom environment. I want to know if normalization and regularization techniques are as important in RL as in Deep Learning.

In my custom environment, the state/observation values are in a different range. For example, one observation is in the range of [1, 20], while another is in [0, 50000]. Should I apply normalization or not? I am confused. Any suggestions?

At the very least you’ll have to train separate networks for the different size states. — David, Sep 07 '21 at 08:33
is it the same field of the state (observation), or are they two different fields of the state that have different magnitude and you think need normalization across fields? If it is the same field, then doesn't it (always) have the same range (not [1,20] but always [min,max] where min/max is the known min/max in an episode) ? Or do you mean that the states' magnitude changes on different episodes? — Sanyou, Sep 08 '21 at 01:50
They are different fields. The observation is of dimension (5 x 1). One of them is in the range of [1, 20], other is in [0, 50000]. Actually, the first one (1-20) is like time steps which increase with the number of steps in each episode, whereas the second one (0 -50000) is like a residual budget. It decreases with steps and if it becomes zero, the agent is penalized. — moyukh, Sep 08 '21 at 04:18

score 3 · Answer 1 · answered Sep 08 '21 at 18:22

The use of normalisation in neural networks and many other (but not all - decision trees are a notable exception) machine learning methods, is to improve the quality of the parameter space with respect to optimisers that will apply to it.

If you are using a function approximator that benefits from normalisation in supervised learning scenarios, it will also benefit from it in reinforcement learning scenarios. That is definitely the case for neural networks. And neural networks are by far the most common approximator used in deep reinforcement learning.

Unlike supervised learning, you will not have a definitive dataset where you can find mean and standard deviation in order to scale to the common $\mu = 0, \sigma = 1$. Instead you will want to scale to a range, such as $[-1, 1]$.

You may also want to perform some basic feature engineering first, such as using log of a value, or some power of it - anything that would make the distribution of values you expect to see more like a Normal distribution. Again, this is something you could do in supervised learning more easily, but maybe you know enough about the feature to make a good guess.

score 2 · Answer 2 · answered Oct 09 '21 at 06:37

On creating custom environments:

... always normalize your observation space when you can, i.e., when you know the boundaries (From stable-baselines)

You could normalize them as part of the environment's state space or before passing them as input to the policy. Depending on the the agent's algorithm implementation, what works for you may vary.

(See this answer from a related question)

Should I apply normalization to the observations in deep reinforcement learning?

2 Answers2