How can imitation learning data be collected?

Question

How can imitation learning data be collected? Can I use a neural network for that? It might be noisy. Should I use manual gathering?

@Rexcirus what do you mean? suppose i am training dnn, on cartpole, how can i collect data on cartpole? — dato nefaridze, Jan 03 '23 at 20:42

score 1 · Answer 1 · answered Jan 03 '23 at 21:00

1

Imitation learning data usually means data gathered from an expert, that is data from an agent proficient in the task.

The agent may be:

A human operator: have the operator complete the task and record inputs and action taken.
A pre-trained reinforcement learning agent. Same as above.

The collected data is exactly the same for both cases.

The hard part is usually building an interface to collect this data, which is very task specific. For instance imitation learning for robotics may require expensive sensors. For cartpole and similar RL environments have a look at https://github.com/HumanCompatibleAI/imitation, they have scripts to learn from RL agents.

answered Jan 03 '23 at 21:00

Rexcirus

1,131
7
19

@Rexcius I was wondering if humans could do that, because if taks is for robotic arm to pick and place an object, then humans might do it with 2 different trajectory, and when those trajectories get into training set dnn would be `confused` because it will not know which path to optimize for (slight deviation is okay but not the huge ones) – dato nefaridze Jan 04 '23 at 11:22
That is why I said it's hard. The easier path for development is to have the humans use the same interface of the robots, for instance teleoperating a robot arm with a joystick, and using the robotic joint positions and velocities as data. But this may result in suboptimal demonstrations, since humans do not optimise how a robot could move. – Rexcirus Jan 04 '23 at 13:59
Regarding having multiple trajectories for the same goal, that is fine, it actually makes your dataset richer. Of course you need to use smart learning strategies than simply averaging trajectories. But that is why you should use existing imitation learning libraries. – Rexcirus Jan 04 '23 at 14:02
no it's not easy, imagine you picked cube from left trajectory from once, and then you did the same but with right trajectory, now robot is reset, in which way should it go? one time you told it that it should go left on the given state and other time you told it to go right on the given state – dato nefaridze Jan 04 '23 at 14:21
It's up to you to either provide more demonstration or providing only coherent demonstrations (e.g. only pick cube from the left). The first option is more general. – Rexcirus Jan 04 '23 at 14:39

How can imitation learning data be collected?

1 Answers1