I am implementing A3C for the CartPole environment. I want to compare the results I got from A3C with the ones I got from AC1. The problem is I don't know which process to look at. If I use, let's say, 11 processes, should I take the first one which got to average 495 points (over the last 100 episodes), last one, or should I take mean of all?
I don't want to take the first one that got to 495 since it is using a model that was already updated by the first few processes and it looks like cheating. Does some norm exist I can follow for valid results?