How does having zero advantage help with identifiability?

Asked Sep 29 '22 at 02:51

Active Jan 25 '23 at 21:52

Viewed 61 times

I am reading the D3QN paper and they have the following paragraph -

Equation (7) is unidentifiable in the sense that given $Q$ we cannot recover $V$ and $A$ uniquely. To see this, add a constant to $V(s; \theta, \beta)$ and subtract the same constant from $A(s, a; \theta, \alpha)$. This constant cancels out resulting in the same $Q$ value. This lack of identifiability is mirrored by poor practical performance when this equation is used directly.

To address this issue of identifiability, we can force the advantage function estimator to have zero advantage at the chosen action.

How does making the advantage for the chosen action 0 (actually, it's a little larger than 0, cause we'll be using the mean instead of max operator), help the neural network learn accurate estimates of $V$ and $A$?

edited Jan 25 '23 at 21:52

nbro

39,006
12
98
176

asked Sep 29 '22 at 02:51

desert_ranger

How does having zero advantage help with identifiability?

0 Answers0