Is there any point in continuing the training of an agent when entropy (of output probabilities) is low?

Asked Sep 16 '22 at 15:58

Active Sep 16 '22 at 15:58

Viewed 10 times

I'm working with a PPO agent with a small, discrete action space (3 possible actions, 1 of which is always masked depending on the state).

Premise 1: My understanding is that the "entropy" of output probabilities is calculated according to the formula given here.

Premise 2: My entropy during training goes down to <0.05 in less than 100k timesteps. Using the above formula, I roughly estimate that this means the agent is >98% confident in all its decisions at that point.

Question: When the agent is already so confident, is there any point in continuing training? Will it learn anything else, or just keep exploiting the policy it found?

I realize I can run experiments to try to answer this empirically, and I have. It appears that it just stops learning at that point, but I wanted to validate my observations and my premises.

asked Sep 16 '22 at 15:58

Vladimir Belik

Is there any point in continuing the training of an agent when entropy (of output probabilities) is low?

0 Answers0