The following quote is taken from the beginning of the chapter on "Approximate Solution Methods" (p. 198) in "Reinforcement Learning" by Sutton & Barto (2018):
reinforcement learning generally requires function approximation methods able to handle nonstationary target functions (target functions that change over time). In control methods based on GPI (generalized policy iteration) we often seek to learn $q_\pi$ while $\pi$ changes. Even if the policy [pi] remains the same, the target values of training examples are nonstationary if they are generated by bootstrapping methods (DP and TD learning).
Could someone explain why the same is not the case if we use non-bootstrapping methods (such as Monte Carlo that is not allowed infinite rollouts)?