According to the question in How to deal with the time delay in reinforcement learning?, we can tell the delay in the reinforcement learning can be observation delay, action delay and reward delay.
I have a special case of the delay but I am not sure what kind of delay is it, and how to deal with it.
For example, at a state St0, my agent takes action A1, but we need to wait for a while to gain the reward R1. Meanwhile, my agent keeps taking action A2 and A3. The trick part is A2 and A3 both influence the environment and may affect the R1.
So the timeline is agent plays action A1, A2, and A3, all of them being effective in the environment immediately but we need to wait for a while to see the reward R1, R2 and R3.
Shall we model this question as an observation delay or reward delay?
When my agent receives the R1 but not R2 and R3, can I update my Q-table by eligibility trace or any other kind of method?