Do smaller loss values during DQN training produce better policies?

Question

During the training of DQN, I noticed that the model with prioritized experience replay (PER) had a smaller loss in general compared to a DQN without PER. The mean squared loss was an order of magnitude $10^{-5}$ for the DQN with PER, whereas the mean squared loss was an order of magnitude $10^{-2}$.

Do the smaller training errors have any effect on executing the final policy learned by the DQN?

score 1 · Answer 1 · answered May 11 '20 at 19:43

1

I think it says something about the training progress, while another approach you can make sure is to look at the gradient norm. Sometimes, the training loss is really noisy while the gradient norm is much more clear.

answered May 11 '20 at 19:43

DongDong Chen

11
1

u mean the gradient of the weights of the model ? what would the gradient norm say about the training ? – calveeen May 12 '20 at 00:51
well smaller norm would indicate that training is about to converge i guess. At a local minima, all the dimensions of a gradient would be 0. – SpiderRico Jun 11 '20 at 00:55

Do smaller loss values during DQN training produce better policies?

1 Answers1