1

Is the GAIL applicable if the expert's trajectories (sample data) are for the same task but are in a different environment (modified but will not be completely different)?

My gut feeling is, yes, otherwise we can just simply adopt behavioural cloning. Furthermore, since the expert's trajectories are from a different environment, the dimension/length of state-action pairs will most likely be different. Will those trajectories still be useful for GAIL training?

nbro
  • 39,006
  • 12
  • 98
  • 176
Sam
  • 175
  • 5

1 Answers1

1

The authors of the paper Learning Robust Rewards with Adversarial Inverse Reinforcement Learning (2018, published in ICRL), which introduced the inverse RL technique AIRL, argue that GAIL fails to generalize to different environment's dynamics. Specifically, in section 7.2 (p. 7), they describe an experiment where they disable and shrink the two front legs of the ant, then, based on the results, they conclude

GAIL learns successfully in the training domain, but does not acquire a representation that is suitable for transfer to test domains.

On the other hand, according to their experiments, AIRL is more robust to changes in the environment's dynamics.

nbro
  • 39,006
  • 12
  • 98
  • 176