For questions related to imitation learning (IL), a reinforcement learning technique where a policy is learned from examples (represented as trajectories) of an (optimal) agent's behavior. IL is similar to inverse reinforcement learning (IRL), where a reward function is learned from examples of the (optimal) agent's behavior, which can then be used to solve the RL problem (i.e. find the policy).
Questions tagged [imitation-learning]
14 questions
7
votes
1 answer
In imitation learning, do you simply inject optimal tuples of experience $(s, a, r, s')$ into your experience replay buffer?
Due to my RL algorithm having difficulties learning some control actions, I've decided to use imitation learning/apprenticeship learning to guide my RL to perform the optimal actions. I've read a few articles on the subject and just want to confirm…

Rui Nian
- 423
- 3
- 13
6
votes
2 answers
What is the difference between imitation learning and classification done by experts?
In short, imitation learning means learning from the experts. Suppose I have a dataset with labels based on the actions of experts. I use a simple binary classifier algorithm to assess whether it is good expert action or bad expert action.
How is…

user781486
- 201
- 1
- 5
6
votes
1 answer
What does the number of required expert demonstrations in Imitation Learning depend on?
I just read the following points about the number of required expert demonstrations in imitation learning, and I'd like some clarifications. For the purpose of context, I'll be using a linear reward function throughout this post (i.e. the reward can…

stoic-santiago
- 1,121
- 5
- 18
3
votes
1 answer
Is there a standardized method to train a reinforcement learning NN by demonstration?
I'm less familiar with reinforcement learning compared to other neural network learning approaches, so I'm unaware of anything exactly like what I want for an approach. I'm wondering if there are any ways to train a Deep-Q neural network on, say,…

Daniel S.
- 33
- 3
2
votes
0 answers
How do multiple coordinate systems help in capturing invariant features?
I've been reading this paper that formulates invariant task-parametrized HSMMs. The task parameters are represented in $F$ coordinate systems defined by $\{A_j,b_j\}_{j=1}^F$, where $A_j$ denotes the rotation of the frame as an orientation matrix…

stoic-santiago
- 1,121
- 5
- 18
2
votes
1 answer
What is the surrogate loss function in imitation learning, and how is it different from the true cost?
I've been reading A Reduction of Imitation Learning and Structured Prediction
to No-Regret Online Learning lately, and I can't understand what they mean by the surrogate loss function.
Some relevant notation from the paper -
$d_\pi$ = average…

stoic-santiago
- 1,121
- 5
- 18
2
votes
0 answers
Can we use imitation learning for on-policy algorithms?
Imitation learning uses experiences of an (expert) agent to train another agent, in my understanding. If I want to use an on-policy algorithm, for example, Proximal Policy Optimization, because of it's on-policy nature we cannot use the experiences…

Khush Agrawal
- 51
- 4
1
vote
1 answer
How can imitation learning data be collected?
How can imitation learning data be collected? Can I use a neural network for that? It might be noisy. Should I use manual gathering?

dato nefaridze
- 862
- 6
- 20
1
vote
1 answer
Why could there be "information leak" if we do not use fixed horizons?
In this page Limitations on horizon length from the Imitation library, the authors recommend that the user sticks to fixed horizon experiments because there could be "information leak" otherwise.
I'm having problems understanding this term, how can…

aletelecomm
- 11
- 1
1
vote
1 answer
Why not use only expert demonstrations in Imitation Learning approaches?
Some IL approaches train the agents by using some specific ratio of expert demonstrations to trajectories generated using the policy being optimized.
In the specific paper I'm reading they say "we experimented with various IL proportions (10-50% by…

Samuel Rodríguez
- 11
- 3
1
vote
0 answers
How to decide size of generated dataset in DAGGER agorithm
In the DAGGER algorithm, how does one determine the number of samples required for one iteration of the training loop?
Looking at the picture above, I understand initially, during the 1st iteration, the dataset D comes from pre-recorded samples and…

RoyJ
- 11
- 1
1
vote
1 answer
Initialising DQN with weights from imitation learning rather than policy gradient network
In AlphaGo, the authors initialised a policy gradient network with weights trained from imitation learning. I believe this gives it a very good starting policy for the policy gradient network. the imitation network was trained on labelled data of…

calveeen
- 1,251
- 7
- 17
1
vote
1 answer
Is GAIL applicable if the expert's trajectories are for the same task but are in a different environment?
Is the GAIL applicable if the expert's trajectories (sample data) are for the same task but are in a different environment (modified but will not be completely different)?
My gut feeling is, yes, otherwise we can just simply adopt behavioural…

Sam
- 175
- 5
0
votes
1 answer
Action selection in Batch-Constrained Deep Q-learning (BCQ)
For simplicity, let's consider the discrete version of BCQ where the paper and the code are available. In the line 5 of Algorithm 1 we have the following:
$$
a' = \text{argmax}_{a'|G_{\omega}(a', s')/\text{max}~\hat{a}~G_{\omega}(\hat{a},…

HenDoNR
- 81
- 4