Questions tagged [imitation-learning]

For questions related to imitation learning (IL), a reinforcement learning technique where a policy is learned from examples (represented as trajectories) of an (optimal) agent's behavior. IL is similar to inverse reinforcement learning (IRL), where a reward function is learned from examples of the (optimal) agent's behavior, which can then be used to solve the RL problem (i.e. find the policy).

14 questions
7
votes
1 answer

In imitation learning, do you simply inject optimal tuples of experience $(s, a, r, s')$ into your experience replay buffer?

Due to my RL algorithm having difficulties learning some control actions, I've decided to use imitation learning/apprenticeship learning to guide my RL to perform the optimal actions. I've read a few articles on the subject and just want to confirm…
6
votes
2 answers

What is the difference between imitation learning and classification done by experts?

In short, imitation learning means learning from the experts. Suppose I have a dataset with labels based on the actions of experts. I use a simple binary classifier algorithm to assess whether it is good expert action or bad expert action. How is…
6
votes
1 answer

What does the number of required expert demonstrations in Imitation Learning depend on?

I just read the following points about the number of required expert demonstrations in imitation learning, and I'd like some clarifications. For the purpose of context, I'll be using a linear reward function throughout this post (i.e. the reward can…
3
votes
1 answer

Is there a standardized method to train a reinforcement learning NN by demonstration?

I'm less familiar with reinforcement learning compared to other neural network learning approaches, so I'm unaware of anything exactly like what I want for an approach. I'm wondering if there are any ways to train a Deep-Q neural network on, say,…
2
votes
0 answers

How do multiple coordinate systems help in capturing invariant features?

I've been reading this paper that formulates invariant task-parametrized HSMMs. The task parameters are represented in $F$ coordinate systems defined by $\{A_j,b_j\}_{j=1}^F$, where $A_j$ denotes the rotation of the frame as an orientation matrix…
stoic-santiago
  • 1,121
  • 5
  • 18
2
votes
1 answer

What is the surrogate loss function in imitation learning, and how is it different from the true cost?

I've been reading A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning lately, and I can't understand what they mean by the surrogate loss function. Some relevant notation from the paper - $d_\pi$ = average…
stoic-santiago
  • 1,121
  • 5
  • 18
2
votes
0 answers

Can we use imitation learning for on-policy algorithms?

Imitation learning uses experiences of an (expert) agent to train another agent, in my understanding. If I want to use an on-policy algorithm, for example, Proximal Policy Optimization, because of it's on-policy nature we cannot use the experiences…
1
vote
1 answer

How can imitation learning data be collected?

How can imitation learning data be collected? Can I use a neural network for that? It might be noisy. Should I use manual gathering?
1
vote
1 answer

Why could there be "information leak" if we do not use fixed horizons?

In this page Limitations on horizon length from the Imitation library, the authors recommend that the user sticks to fixed horizon experiments because there could be "information leak" otherwise. I'm having problems understanding this term, how can…
1
vote
1 answer

Why not use only expert demonstrations in Imitation Learning approaches?

Some IL approaches train the agents by using some specific ratio of expert demonstrations to trajectories generated using the policy being optimized. In the specific paper I'm reading they say "we experimented with various IL proportions (10-50% by…
1
vote
0 answers

How to decide size of generated dataset in DAGGER agorithm

In the DAGGER algorithm, how does one determine the number of samples required for one iteration of the training loop? Looking at the picture above, I understand initially, during the 1st iteration, the dataset D comes from pre-recorded samples and…
RoyJ
  • 11
  • 1
1
vote
1 answer

Initialising DQN with weights from imitation learning rather than policy gradient network

In AlphaGo, the authors initialised a policy gradient network with weights trained from imitation learning. I believe this gives it a very good starting policy for the policy gradient network. the imitation network was trained on labelled data of…
1
vote
1 answer

Is GAIL applicable if the expert's trajectories are for the same task but are in a different environment?

Is the GAIL applicable if the expert's trajectories (sample data) are for the same task but are in a different environment (modified but will not be completely different)? My gut feeling is, yes, otherwise we can just simply adopt behavioural…
0
votes
1 answer

Action selection in Batch-Constrained Deep Q-learning (BCQ)

For simplicity, let's consider the discrete version of BCQ where the paper and the code are available. In the line 5 of Algorithm 1 we have the following: $$ a' = \text{argmax}_{a'|G_{\omega}(a', s')/\text{max}~\hat{a}~G_{\omega}(\hat{a},…