In some newer robotics literature, the term system identification is used in a certain meaning. The idea is not to use a fixed model, but to create the model on the fly. So it is equal to a model-free system identification. Perhaps a short remark for all, who doesn't know what the idea is. System identification means, to create a prediction model, better known as a forward numerical simulation. The model takes the input and calculates the outcome. It's not exactly the same like a physics engine, but both are operating with a model in the loop which is generating the output in realtime.
But what is policy learning? Somewhere, I've read that policy learning is equal to online system identification. Is that correct? And if yes, then it doesn't make much sense, because reinforcement learning has the goal to learn a policy. A policy is something which controls the robot. But if the aim is to do system identification, than the policy is equal to the prediction model. Perhaps somebody can lower the confusion about the different terms ...
Example Q-learning is a good example for reinforcement learning. The idea is to construct a q-table and this table controls the robot movements. But, if online-system-identification is equal to policy learning and this is equal to q-learning, then the q-table doesn't contains the servo signals for the robot, but it provides only the prediction of the system. That means, the q-table is equal to a box2d physics engine which can say, what x/y coordinates the robot will have. This kind of interpretation doesn't make much sense. Or does it make sense and the definition of a policy is quite different?