I realize that my question is a bit fuzzy and I am sorry for that. If needed, I will try to make it more rigorous and precice.
Let $\mathcal{M}$ be a Markov Decision Process, with state space $\mathcal{S}$ and action space $\mathcal{A}$. Let $\tau = (s_0, a_0, s_1, a_1, s_2, a_2, \dots)$ and $\tau' = (s_0', a_0', s_1', a_1', s_2', a_2', \dots)$ be two trajectories produced by an agent during two different episodes.
Question: Is there any standard way in the Reinforcement Learning literature to compare $\tau$ and $\tau'$? Ideally I am interested in finding a "distance" (it does not need to be a distance in the mathematical sense) $d(\tau, \tau ')$ such that it reflects the "distance" between the poicies that generated $\tau$ and $\tau'$.
For example it would be nice if $d(\tau, \tau')$ would be a good estimator of the KL divergence of $\pi$ and $\pi'$, where $\pi$ is the policy that generated $\tau$ and $\pi'$ the policy that generated $\tau'$.