7

I want to know if there is any metric to use for measuring sample-efficiency of a reinforcement learning algorithm? From reading research papers, I see claims that proposed models are more sample efficient but how does one reach this conclusion when comparing reinforcement learning algorithms?

rert588
  • 320
  • 1
  • 7

1 Answers1

5

First of all, let's recall some definitions.

A sample in the context of (deep) RL is a tuple $(s_t, a_t, r_t, s_{t+1})$ representing the information associated with a single interaction with the environment.

As for sample efficiency, it is defined as follows [1]:

Sample efficiency refers to the amount of data required for a learning system to attain any chosen target level of performance.

So, how you measure it is closely related to the way it is defined.

For example, one way to do it would be as shown in the figure below:

  • On the y-axis you have the performance of your RL algorithm (e.g. in terms of average return over episodes as done in [2], or mean total episode reward across different environment runs as done in [3])

  • On the x-axis you have the number of samples that you took.

  • The dashed line corresponds to your performance baseline (e.g. the performance at which a certain game or any other RL environment is considered solved).

So, you can measure sample efficiency at the intersection, where you get the number of samples needed to reach the performance baseline. So, an algorithm that requires less samples would be more sample-efficient.

enter image description here

Another way to do it would be the other way around, i.e. the RL agent is provided with a limited budget for the number of samples it could take. As a result, you can measure sample efficiency my measuring the area under the curve, as illustrated below. So, that would be how much performance you got just by using those samples in the budget. An algorithm that achieves a higher performance than another with the same amount of samples would then be more sample efficient.

enter image description here

I am not aware if there exist RL libraries that would provide you with this measure out-of-the-box. However, if you're using Python for example, I believe that using libraries like scipy or scikit-learn along with matplotlib could do the job.

NB: Image credits go to the following presentation: DLRLSS 2019 - Sample Efficient RL - Harm Van Seijen

user5093249
  • 722
  • 4
  • 8