I am thinking of applying apprenticeship learning on retrospective data. From looking at this paper by Ng https://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf which talks about apprenticeship learning, it seems to me that at the 5th step of the algorithm,
- Compute (or estimate) $μ^{(i)}$ = $μ(π^{(i)})$, where $\mu^{(i)}$ = $E[\sum_{t=0}^{∞}\gamma^{t}$$\phi(s_{t})$ | $\pi^{(i)}]$, $\phi(s_{t})$ is the reward feature vector at state $s_t$.
From my understanding, a sequence of $s_0, s_1, s_2 ..$ trajectory would have to be generated at this step, following this policy $\pi^{(i)}$. Hence, applying this algorithm on retrospective data would not work?