In model-based reinforcement learning algorithms, the model of the environment is constructed to efficiently use samples, models such as Dyna, and Prioritize Sweeping. Moreover, eligibility trace helps the model learns (action) value functions faster.
Can I know if it is possible to combine learning, planning, and eligibility traces in a model to increase its convergence rate? If yes, how it is possible to use eligibility traces in the planning part, like Prioritize Sweeping?