Questions tagged [inverse-rl]

For questions related to inverse reinforcement learning (IRL), the problem of recovering the reward function from the observed behavior (or policy) of an agent. It's called IRL because it's the inverse problem of RL, i.e. the problem of finding optimal policies given the reward function.

13 questions
7
votes
2 answers

What are some best practices when trying to design a reward function?

Generally speaking, is there a best-practice procedure to follow when trying to define a reward function for a reinforcement-learning agent? What common pitfalls are there when defining the reward function, and how should you avoid them? What…
6
votes
1 answer

What does the number of required expert demonstrations in Imitation Learning depend on?

I just read the following points about the number of required expert demonstrations in imitation learning, and I'd like some clarifications. For the purpose of context, I'll be using a linear reward function throughout this post (i.e. the reward can…
4
votes
1 answer

Can recovering a reward function using IRL lead to better policies compared to reward shaping?

I am working on a research project about the different reward functions being used in the RL domain. I have read up on Inverse Reinforcement Learning (IRL) and Reward Shaping (RS). I would like to clarify some doubts that I have with the 2…
3
votes
1 answer

Reward design or Inverse reinforcement learning?

I'm working on a reinforcement learning project where I only have demonstrations (i.e. set of states and actions). During my research on how handle the reward signal, I noticed that research papers often design their reward functions, based on…
3
votes
1 answer

Expressing Arbitrary Reward Functions as Potential-Based Advice (PBA)

I am trying to reproduce the results for the simple grid-world environment in [1]. But it turns out that using a dynamically learned PBA makes the performance worse and I cannot obtain the results shown in Figure 1 (a) in [1] (with the same…
2
votes
1 answer

Why is it that the state visitation frequency equals the sum of state visitation frequency from initial time step to the horizon?

In the maximum entropy inverse reinforcement learning paper, Ziebart et al. show that the state visitation frequency $\rho(s)$ of a state $s$ can be computed as $$ \rho_{\pi}(s) = \sum_{t}^{T} P(s_t=s|\pi), $$ which is the sum of the probability…
skypitcher
  • 31
  • 1
2
votes
0 answers

What is the dimensionality of these derivatives in the paper "Active Learning for Reward Estimation in Inverse Reinforcement Learning"?

I'm trying to implement in code part of the following paper: Active Learning for Reward Estimation in Inverse Reinforcement Learning. I'm specifically referring to section 2.3 of the paper. Let's define $\mathcal{X}$ as the set of states, and…
1
vote
0 answers

Can I use a dataset with real-world images and corresponding actions that the expert took to train an IRL algorithm?

Offline Reinforcement Learning approaches like Inverse Reinforcement Learning/ Batch RL/ imitation learning/ behavior cloning allow us to use previous demonstrations by an expert to learn a policy. Many of the papers that I have found use expert…
1
vote
0 answers

What do state features mean in the context of inverse RL?

I am reading Zeibart's Inverse RL paper, and it states - The agent is assumed to be attempting to optimize some function that linearly maps the features of each state, $f_{sj} \in \mathbb{R}^k$, to a state reward value representing the agent’s…
1
vote
1 answer

Can entire neural networks be composed of only activation functions?

Inverse Reinforcement Learning based on GAIL and GAN-Guided Cost Learning(GAN-GCL), uses a discriminator to classify between expert demos and policy generated samples. Adversarial iRL, build upon GAN-GCL, has its discriminator $D_{\theta, \phi}$ as…
0
votes
0 answers

Augmented an Image with other data when training CNN

In the typical RL/MDP framework, I have offline data of $(s,a,r,s')$ of expert Atari gameplay. I'm looking to train a CNN to predict $r$ based on $(s, a)$. The states are represented by a $4 \times 84 \times 84$ image of the Atari screen, where 4…
0
votes
0 answers

Proving existence or non existence of reward function to make given policy "uniquely" optimal when reward function is dependent only on S or both S,A

I was going through paper titled "Algorithms for Inverse Reinforcement Learning" by Andrew Ng and Russell. It states following basics: MDP $M$ is a tuple $(S,A,\{P_{sa}\},\gamma,R)$, where $S$ is a finite seto of $N$ states $A=\{a_1,...,a_k\}$ is…
Rnj
  • 221
  • 2
  • 6
0
votes
1 answer

How to make input variable as trainable parameter in a neural network?

I am working on an optimization problem. First, I have done forward training to work the network as a surrogate model, then I freeze the output and I want to find an optimal value of input for a given output.
Preetz
  • 11
  • 1