Highest Voted 'inverse-rl' Questions - Artificial Intelligence Stack Exchange

7

votes

2 answers

What are some best practices when trying to design a reward function?

Generally speaking, is there a best-practice procedure to follow when trying to define a reward function for a reinforcement-learning agent? What common pitfalls are there when defining the reward function, and how should you avoid them? What…

asked Aug 03 '20 at 16:30

12 rhombi in grid w no corners

185
1
8

6

votes

1 answer

What does the number of required expert demonstrations in Imitation Learning depend on?

I just read the following points about the number of required expert demonstrations in imitation learning, and I'd like some clarifications. For the purpose of context, I'll be using a linear reward function throughout this post (i.e. the reward can…

reinforcement-learning apprenticeship-learning inverse-rl imitation-learning

asked Aug 13 '20 at 11:01

stoic-santiago

1,121
5
18

4

votes

1 answer

Can recovering a reward function using IRL lead to better policies compared to reward shaping?

I am working on a research project about the different reward functions being used in the RL domain. I have read up on Inverse Reinforcement Learning (IRL) and Reward Shaping (RS). I would like to clarify some doubts that I have with the 2…

reinforcement-learning deep-rl rewards reward-shaping inverse-rl

asked Feb 13 '20 at 06:04

calveeen

1,251
7
17

3

votes

1 answer

Reward design or Inverse reinforcement learning?

I'm working on a reinforcement learning project where I only have demonstrations (i.e. set of states and actions). During my research on how handle the reward signal, I noticed that research papers often design their reward functions, based on…

reinforcement-learning reward-functions reward-design inverse-rl

asked Aug 11 '22 at 06:50

Eman.suradi

133
3

3

votes

1 answer

Expressing Arbitrary Reward Functions as Potential-Based Advice (PBA)

I am trying to reproduce the results for the simple grid-world environment in [1]. But it turns out that using a dynamically learned PBA makes the performance worse and I cannot obtain the results shown in Figure 1 (a) in [1] (with the same…

reinforcement-learning reward-design reward-shaping inverse-rl potential-reward-shaping

asked Mar 25 '19 at 04:26

bcxiao

33
3

2

votes

1 answer

Why is it that the state visitation frequency equals the sum of state visitation frequency from initial time step to the horizon?

In the maximum entropy inverse reinforcement learning paper, Ziebart et al. show that the state visitation frequency $\rho(s)$ of a state $s$ can be computed as $$ \rho_{\pi}(s) = \sum_{t}^{T} P(s_t=s|\pi), $$ which is the sum of the probability…

reinforcement-learning inverse-rl

asked Apr 25 '21 at 12:30

skypitcher

31
1

2

votes

0 answers

What is the dimensionality of these derivatives in the paper "Active Learning for Reward Estimation in Inverse Reinforcement Learning"?

I'm trying to implement in code part of the following paper: Active Learning for Reward Estimation in Inverse Reinforcement Learning. I'm specifically referring to section 2.3 of the paper. Let's define $\mathcal{X}$ as the set of states, and…

reinforcement-learning papers rewards inverse-rl derivative

asked Jan 26 '21 at 10:13

ИванКарамазов

141
5

1

vote

0 answers

Can I use a dataset with real-world images and corresponding actions that the expert took to train an IRL algorithm?

Offline Reinforcement Learning approaches like Inverse Reinforcement Learning/ Batch RL/ imitation learning/ behavior cloning allow us to use previous demonstrations by an expert to learn a policy. Many of the papers that I have found use expert…

reinforcement-learning training datasets inverse-rl

asked Oct 17 '22 at 09:26

a_razzaq

11
2

1

vote

0 answers

What do state features mean in the context of inverse RL?

I am reading Zeibart's Inverse RL paper, and it states - The agent is assumed to be attempting to optimize some function that linearly maps the features of each state, $f_{sj} \in \mathbb{R}^k$, to a state reward value representing the agent’s…

reinforcement-learning terminology papers features inverse-rl

asked Jan 26 '22 at 14:08

desert_ranger

586
3
19

1

vote

1 answer

Can entire neural networks be composed of only activation functions?

Inverse Reinforcement Learning based on GAIL and GAN-Guided Cost Learning(GAN-GCL), uses a discriminator to classify between expert demos and policy generated samples. Adversarial iRL, build upon GAN-GCL, has its discriminator $D_{\theta, \phi}$ as…

neural-networks reinforcement-learning papers activation-functions inverse-rl

asked Sep 06 '20 at 10:35

mugoh

531
4
20

0

votes

0 answers

Augmented an Image with other data when training CNN

In the typical RL/MDP framework, I have offline data of $(s,a,r,s')$ of expert Atari gameplay. I'm looking to train a CNN to predict $r$ based on $(s, a)$. The states are represented by a $4 \times 84 \times 84$ image of the Atari screen, where 4…

reinforcement-learning convolutional-neural-networks inverse-rl

asked Dec 31 '21 at 23:48

Snowball

213
1
6

0

votes

0 answers

Proving existence or non existence of reward function to make given policy "uniquely" optimal when reward function is dependent only on S or both S,A

I was going through paper titled "Algorithms for Inverse Reinforcement Learning" by Andrew Ng and Russell. It states following basics: MDP $M$ is a tuple $(S,A,\{P_{sa}\},\gamma,R)$, where $S$ is a finite seto of $N$ states $A=\{a_1,...,a_k\}$ is…

reinforcement-learning deep-rl inverse-rl

asked Mar 14 '21 at 23:16

Rnj

221
2
6

0

votes

1 answer

How to make input variable as trainable parameter in a neural network?

I am working on an optimization problem. First, I have done forward training to work the network as a surrogate model, then I freeze the output and I want to find an optimal value of input for a given output.

neural-networks training optimization inverse-rl

asked Jan 20 '21 at 19:32

Preetz

11
1

Questions tagged [inverse-rl]