5

In section 3 of the paper Continuous control with deep reinforcement learning, the authors write

As detailed in the supplementary materials we used an Ornstein-Uhlenbeck process (Uhlenbeck & Ornstein, 1930) to generate temporally correlated exploration for exploration efficiency in physical control problems with inertia (similar use of autocorrelated noise was introduced in (Wawrzynski, 2015)).

In section 7, they write

For the exploration noise process we used temporally correlated noise in order to explore well in physical environments that have momentum. We used an Ornstein-Uhlenbeck process (Uhlenbeck & Ornstein, 1930) with θ = 0.15 and σ = 0.2. The Ornstein-Uhlenbeck process models the velocity of a Brownian particle with friction, which results in temporally correlated values centered around 0.

In a few words, what is the Ornstein-Uhlenbeck process? How does it work? How exactly is it used in DDPG?

I want to implement the Deep Deterministic Policy Gradient algorithm, and, in the initial actions, noise has to be added. However, I cannot understand how this Ornstein-Uhlenbeck process works. I have searched the internet, but I have not understood the information that I found.

nbro
  • 39,006
  • 12
  • 98
  • 176
dani
  • 51
  • 3
  • 1
    It should be noted that more recent work suggests that uncorrelated Gaussian noise works just as well. TD3 paper (https://arxiv.org/pdf/1802.09477.pdf): "Unlike the original DDPG, we used uncorrelated noise for exploration as we found noise drawn from the Ornstein-Uhlenbeck (Uhlenbeck & Ornstein, 1930) process offered no performance benefits." D4PG paper (https://arxiv.org/pdf/1804.08617.pdf): "We experimented with correlated noise drawn from an Ornstein-Uhlenbeck process, as suggested by (Lillicrap et al., 2016), however we found this was unnecessary and did not add to performance." – Jonas De Schouwer Feb 01 '22 at 09:47

1 Answers1

6

The Ornstein Ulhenebck Process is defined as (in the continuous setting) :

$$dX_t = -\beta(X_t - \alpha)dt + \sigma dW_t$$

The analogue for this process in the discrete time case which I assume will be applicable in the RL case will be: $$X_{t+1} = X_t -\beta(X_t - \alpha) + \sigma \{W_{t+1}-W_t\}=$$ $$X_{t+1} = (1 -\beta)X_t + \alpha\beta + \sigma \{W_{t+1}-W_t\}$$

In the RL seting the terms in the equation probably means:

  • $X_t$ will stand for a state in RL i.e. the state is the number $\in \mathbb R$ where the particle moves to at time $t$.
  • $\beta$ and $\alpha$ are just constants which decide certain movement characteristics of the particle. Check here for graphs plotted for various $\beta$.
  • $W_t$ is a Weiner process which starts at $W_0 = 0$ and then adds independent increments of $\mathcal N(\mu,\sigma)$ as $W_{t+1} = W_t+\mathcal N(\mu,\sigma)$ which is basically a radom walk. More generally we use $\mathcal N(0,1)$. This is formulated as $W_t-W_s = \sqrt{t-s} \mathcal N(0,1)$. This is because of the fact, $W_t$ can be written recursively as $W_t = \mathcal N(0,1)+W_{t-1} = \mathcal N(0,1) + \mathcal N(0,1) + ...W_s$ and since the samplings are independent at each step the mean get added as $\mu_t+\mu_{t-1}...$ and the variances as $\sigma_t^2 + \sigma_{t-1}^2...$. SInce her the means and variances are $0$ and $1$ respectively, the final mean $\mu = 0$ and variance $\sigma^2 = (t-s)$. And hence, byproperties of Gaussian random variables you can write (it is easy to show this via variable tranformation) $W_t-W_s = \sqrt{t-s} \mathcal N(0,1)$. Here, is the formulation of standard Weiner process.
  • $\sigma$ will be the weighting factor of the Weiner process, which just means the amount of noise being added to the process.

Another useful resource on discrete Ornstein Ulhenbeck process, much less generalized. I think now you can extend this to whatever scenario you are intereted in RL setting.