1

I am thinking of applying apprenticeship learning on retrospective data. From looking at this paper by Ng https://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf which talks about apprenticeship learning, it seems to me that at the 5th step of the algorithm,

  1. Compute (or estimate) $μ^{(i)}$ = $μ(π^{(i)})$, where $\mu^{(i)}$ = $E[\sum_{t=0}^{∞}\gamma^{t}$$\phi(s_{t})$ | $\pi^{(i)}]$, $\phi(s_{t})$ is the reward feature vector at state $s_t$.

From my understanding, a sequence of $s_0, s_1, s_2 ..$ trajectory would have to be generated at this step, following this policy $\pi^{(i)}$. Hence, applying this algorithm on retrospective data would not work?

nbro
  • 39,006
  • 12
  • 98
  • 176
calveeen
  • 1,251
  • 7
  • 17
  • This is a relatively old question, but by "retrospective data" or "prospective data" (as in the title), do you mean data generated with previous policies? I suggest that you edit your post and use more common terms to clarify your question. – nbro Oct 13 '20 at 12:05

0 Answers0