0

I am study the paper with TD3 algorithm.

I am curious about the meaning of $\alpha$ while the paper prove that overestimation will be happened in a critical situation.

The contents about mathematical proof is like ...

$\exists \epsilon_1 \ s.t \alpha \leq \epsilon_1 \implies \mathop{\mathbb{E}}[Q_\theta(s,\pi_{approx}(s))]\geq\mathop{\mathbb{E}}[Q_\theta(s,\pi_{true}(s))]$

$\exists \epsilon_2 \ s.t \alpha \leq \epsilon_2 \implies \mathop{\mathbb{E}}[Q^\pi(s,\pi_{true}(s))]\geq\mathop{\mathbb{E}}[Q^\pi(s,\pi_{approx}(s))]$

And the overestimation will happened when value estimation at least as large as the $true$ value with respect to $\phi_{true}$

$\mathop{\mathbb{E}}[Q_{\theta}(s,\pi_{true}(s))]\geq\mathop{\mathbb{E}}[Q^{\pi}(s,\pi_{true}(s))]$

Such that overestimation happened when,

$\mathop{\mathbb{E}}[Q_\theta(s,\pi_{approx}(s))]\geq\mathop{\mathbb{E}}[Q^\pi(s,\pi_{approx}(s))], \ \alpha < \min(\epsilon_1,\epsilon_2)$

In my opinion, the value of $\alpha$ just a threshold that can let the equation make sense, right ?

jackson
  • 1
  • 2

0 Answers0