2

After reading some literature on reinforcement learning (RL), it seems that stochastic approximation theory underlies all of it.

There's a lot of substantial and difficult theory in this area requiring measure theory leading to martingales and stochastic approximations.

The standard RL texts at best mention the relevant theorem and then move on.

Is the field of RL is really stochastic approximation theory in disguise? Is RL just a less rigorous version of stochastic approximation theory?

nbro
  • 39,006
  • 12
  • 98
  • 176
FourierFlux
  • 783
  • 1
  • 4
  • 14

1 Answers1

2

Is the field of RL is really stochastic approximation theory in disguise? Is RL just a less rigorous version of stochastic approximation theory?

No, but reinforcement learning (RL) is based on stochastic approximation theory (SAT), and these two fields overlap.

In RL, you typically assume that the underlying problem can be modeled as a Markov decision process (MDP), and the goal is to find a policy (or value function) that solves this MDP. To find this policy, you can use stochastic approximation algorithms, such as Q-learning, but RL isn't just SAT, where, in general, there isn't necessarily a notion of MDP.

SAT is the study of iterative algorithms to find the extrema of functions by sampling from them and under which conditions these iterative algorithms converge. SAT isn't just applied in RL, but it is applied in many other fields, such as deep learning. The paper Scalable estimation strategies based on stochastic approximations: Classical results and new insights (2015) by P. Toulis et al. provides an overview of SAT and the connections with other fields (including RL).

To conclude, RL is based on SAT, but RL isn't just stochastic approximation algorithms, so they are distinct fields. If you want to study e.g. the convergence properties of certain RL algorithms, you may need to study SAT. In fact, for example, the typical proof of convergence for tabular Q-learning assumes the Robbins–Monro conditions. However, you can do a lot of RL without even knowing that RL is based on SAT. Similarly, you can do a lot of SAT without ever caring about RL.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • Sure, but at it's core RL is essentially stochastic approximation with some stuff added ontop. The Robins Monroe theorem is pretty much what everything is built on and without it RL wouldn't exist(at least in any rigorous sense). – FourierFlux Apr 15 '20 at 03:49
  • @FourierFlux I've updated my answer to further explain that, yes, they overlap, but they are distinct. – nbro Apr 15 '20 at 04:00
  • I wouldn't call it distinct, maybe a subfield. But can you really do any work in a theoretical sense in RL without a background in SAT? – FourierFlux Apr 15 '20 at 04:01
  • They are distinct in the sense that there are notions in RL (e.g. MDPs) that do not _necessarily_ exist in SAT. As I say in my answer, if you want to study the convergence properties of RL algorithms, you probably need SAT, yes, because RL algorithms are essentially iterative algorithms and the goal in RL is to find extrema of functions. However, note that RL wouldn't exist without actions (or decisions). In fact, RL is sometimes also called sequential decision-making. – nbro Apr 15 '20 at 04:04
  • 1
    @FourierFlux: Your claims are similar to "chemistry is just physics". Whilst true at some level, a researcher who only studies and practices physics will not be an expert chemist. – Neil Slater Apr 15 '20 at 07:09
  • Chemistry has its own theory/assumptions which predated modern physics. I'm not seeing that in RL, which is why I said it was a less rigorous version of SAT. – FourierFlux Apr 15 '20 at 14:40
  • @FourierFlux Well, we could say that all matter (including our brains that discovered and developed chemistry, RL, and SAT) is made of atoms, so, in the end, everything is an atom or a bunch of atoms. How productive is that reasoning for the purposes of distinguishing RL from SAT? – nbro May 22 '20 at 13:46