I am fairly new to reinforcement learning (RL) and deep RL. I have been trying to create my first agent (using A3C) that selects an optimal path with the reward being some associated completion time (the more optimal the path is, packets will be delivered in less time kind of).
However, in each episode/epoch, I do not have a certain batch size for updating my NN's parameters.
To make it more clear, let's say that, in my environment, on each step, I need to perform a request to the servers, and I have to select the optimal path.
Now, each execution in my environment does not contain the same amount of files. For instance, I might have 3 requests as one run, and then 5 requests for the next one.
The A3C code I have at hand has a batch size of a 100. In my case, that batch size is not known a priori.
Is that going to affect my training? Should I find a way to keep it fixed somehow? And how can one define an optimal batch size for updating?