Is GAIL applicable if the expert's trajectories are for the same task but are in a different environment?

Question

Is the GAIL applicable if the expert's trajectories (sample data) are for the same task but are in a different environment (modified but will not be completely different)?

My gut feeling is, yes, otherwise we can just simply adopt behavioural cloning. Furthermore, since the expert's trajectories are from a different environment, the dimension/length of state-action pairs will most likely be different. Will those trajectories still be useful for GAIL training?

I mean the supervised imitation learning approach where the agent tries to copy the demonstrated labeled data. — Sam, Nov 06 '20 at 11:46

nbro · Accepted Answer · 2020-11-05T23:45:50.413

The authors of the paper Learning Robust Rewards with Adversarial Inverse Reinforcement Learning (2018, published in ICRL), which introduced the inverse RL technique AIRL, argue that GAIL fails to generalize to different environment's dynamics. Specifically, in section 7.2 (p. 7), they describe an experiment where they disable and shrink the two front legs of the ant, then, based on the results, they conclude

GAIL learns successfully in the training domain, but does not acquire a representation that is suitable for transfer to test domains.

On the other hand, according to their experiments, AIRL is more robust to changes in the environment's dynamics.

Is GAIL applicable if the expert's trajectories are for the same task but are in a different environment?

1 Answers1