In deep-rl techniques, if I understand correctly, a replay buffer is used in training the neural networks. The purpose of using the replay buffer is to store the experience and send a (sampled) batch of unit transitions to train neural networks as it is known that neural networks work well for iid data.
But in games, experience trajectory is important as it contains temporal dynamics. Am I true? If not, all the knowledge required to learn the policy function can be obtained from (out of sequence or randomly sampled) unit transitions alone.
Which one among the both is correct?
Note that unit transition in this question refers to $(s_t, a_t, r_t, s_{t+1})$