I will give one perspective on this from the domain of robotics. You are right that most RL agents are trained in simulation particularly for research papers, because it allows researchers to in theory benchmark their approaches in a common environment. Many of the environments exist strictly as a test bed for new algorithms and are not even physically realizable, e.g. HalfCheetah. You could in theory have a separate simulator say running in another process that you use as your planning model, and the "real" simulator is then your environment. But really that's just a mocked setup for what you really want in the end, which is having a real-world agent in a real-world environment.
What you describe could be very useful, with one important caveat: the simulator needs to in fact be a good model of the real environment. For robotics and many other interesting domains, this is a tall order. Getting a physics simulator that faithfully replicates the real-world environment can be tricky, as one may need accurate friction coefficients, mass and center of mass, restitution coefficients, material properties, contact models, and so on. Oftentimes the simulator is too crude an approximation of the real-world environment to be useful as a planner.
That doesn't mean we're completely hosed though. This paper uses highly parallelized simulators to search for simulation parameters that approximate the real-world well. What's interesting is it's not even necessarily finding the correct real-world values for e.g. friction coefficients and such, but it finds values for parameters that, taken together, produces simulations that match the real-world experience. The better the simulation gets at approximating what's going on in the real world, the more viable it is to use the simulator for task planning. I think with the advent of GPU-optimized physics simulators we will see simulators be a more useful tool even for real-world agents, as you can try many different things in parallel to get a sense of what is the likely outcome of a planned action sequence.