Highest Voted 'imitation-learning' Questions - Artificial Intelligence Stack Exchange

7

votes

1 answer

In imitation learning, do you simply inject optimal tuples of experience $(s, a, r, s')$ into your experience replay buffer?

Due to my RL algorithm having difficulties learning some control actions, I've decided to use imitation learning/apprenticeship learning to guide my RL to perform the optimal actions. I've read a few articles on the subject and just want to confirm…

asked Aug 27 '18 at 18:41

Rui Nian

423
3
13

6

votes

2 answers

What is the difference between imitation learning and classification done by experts?

In short, imitation learning means learning from the experts. Suppose I have a dataset with labels based on the actions of experts. I use a simple binary classifier algorithm to assess whether it is good expert action or bad expert action. How is…

reinforcement-learning classification comparison supervised-learning imitation-learning

asked Dec 19 '18 at 02:38

user781486

201
1
5

6

votes

1 answer

What does the number of required expert demonstrations in Imitation Learning depend on?

I just read the following points about the number of required expert demonstrations in imitation learning, and I'd like some clarifications. For the purpose of context, I'll be using a linear reward function throughout this post (i.e. the reward can…

reinforcement-learning apprenticeship-learning inverse-rl imitation-learning

asked Aug 13 '20 at 11:01

stoic-santiago

1,121
5
18

3

votes

1 answer

Is there a standardized method to train a reinforcement learning NN by demonstration?

I'm less familiar with reinforcement learning compared to other neural network learning approaches, so I'm unaware of anything exactly like what I want for an approach. I'm wondering if there are any ways to train a Deep-Q neural network on, say,…

neural-networks reinforcement-learning reference-request algorithm-request imitation-learning

asked Jun 26 '22 at 20:32

Daniel S.

33
3

2

votes

0 answers

How do multiple coordinate systems help in capturing invariant features?

I've been reading this paper that formulates invariant task-parametrized HSMMs. The task parameters are represented in $F$ coordinate systems defined by $\{A_j,b_j\}_{j=1}^F$, where $A_j$ denotes the rotation of the frame as an orientation matrix…

robotics hidden-markov-model imitation-learning

asked Oct 20 '20 at 15:55

stoic-santiago

1,121
5
18

2

votes

1 answer

What is the surrogate loss function in imitation learning, and how is it different from the true cost?

I've been reading A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning lately, and I can't understand what they mean by the surrogate loss function. Some relevant notation from the paper - $d_\pi$ = average…

reinforcement-learning papers imitation-learning

asked Aug 13 '20 at 09:15

stoic-santiago

1,121
5
18

2

votes

0 answers

Can we use imitation learning for on-policy algorithms?

Imitation learning uses experiences of an (expert) agent to train another agent, in my understanding. If I want to use an on-policy algorithm, for example, Proximal Policy Optimization, because of it's on-policy nature we cannot use the experiences…

reinforcement-learning proximal-policy-optimization importance-sampling on-policy-methods imitation-learning

asked Jan 04 '20 at 09:01

Khush Agrawal

51
4

1

vote

1 answer

How can imitation learning data be collected?

How can imitation learning data be collected? Can I use a neural network for that? It might be noisy. Should I use manual gathering?

reinforcement-learning deep-learning datasets imitation-learning data-collection

asked Jan 03 '23 at 10:26

dato nefaridze

862
6
20

1

vote

1 answer

Why could there be "information leak" if we do not use fixed horizons?

In this page Limitations on horizon length from the Imitation library, the authors recommend that the user sticks to fixed horizon experiments because there could be "information leak" otherwise. I'm having problems understanding this term, how can…

reinforcement-learning terminology imitation-learning episodes

asked Nov 18 '22 at 13:42

aletelecomm

11
1

1

vote

1 answer

Why not use only expert demonstrations in Imitation Learning approaches?

Some IL approaches train the agents by using some specific ratio of expert demonstrations to trajectories generated using the policy being optimized. In the specific paper I'm reading they say "we experimented with various IL proportions (10-50% by…

reinforcement-learning machine-learning deep-rl imitation-learning

asked Aug 11 '22 at 11:11

Samuel Rodríguez

11
3

1

vote

0 answers

How to decide size of generated dataset in DAGGER agorithm

In the DAGGER algorithm, how does one determine the number of samples required for one iteration of the training loop? Looking at the picture above, I understand initially, during the 1st iteration, the dataset D comes from pre-recorded samples and…

imitation-learning

asked Jul 20 '22 at 10:43

RoyJ

11
1

1

vote

1 answer

Initialising DQN with weights from imitation learning rather than policy gradient network

In AlphaGo, the authors initialised a policy gradient network with weights trained from imitation learning. I believe this gives it a very good starting policy for the policy gradient network. the imitation network was trained on labelled data of…

reinforcement-learning dqn deep-rl alphago imitation-learning

asked Nov 14 '20 at 10:52

calveeen

1,251
7
17

1

vote

1 answer

Is GAIL applicable if the expert's trajectories are for the same task but are in a different environment?

Is the GAIL applicable if the expert's trajectories (sample data) are for the same task but are in a different environment (modified but will not be completely different)? My gut feeling is, yes, otherwise we can just simply adopt behavioural…

reinforcement-learning generative-adversarial-networks generative-model imitation-learning gail

asked Jun 30 '20 at 14:05

Sam

175
5

0

votes

1 answer

Action selection in Batch-Constrained Deep Q-learning (BCQ)

For simplicity, let's consider the discrete version of BCQ where the paper and the code are available. In the line 5 of Algorithm 1 we have the following: $$ a' = \text{argmax}_{a'|G_{\omega}(a', s')/\text{max}~\hat{a}~G_{\omega}(\hat{a},…

reinforcement-learning deep-rl dqn imitation-learning offline-reinforcement-learning

asked Jan 26 '22 at 18:53

HenDoNR

81
4

Questions tagged [imitation-learning]