For questions related to the "on-policy" reinforcement learning algorithms.
On-policy RL algorithms use their current approximation of the policy they attempt to estimate in order to interact with the environment (to gain experience and further update their approximation). An example of an on-policy algorithm is SARSA.