3

Here's the general algorithm of maximum entropy inverse reinforcement learning.

enter image description here

This uses a gradient descent algorithm. The point that I do not understand is there is only a single gradient value $\nabla_\theta \mathcal{L}$, and it is used to update a vector of parameters. To me, it does not make sense because it is updating all elements of a vector with the same value $\nabla_\theta \mathcal{L}$. Can you explain the logic behind updating a vector with a single gradient?

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

4

This is standard backpropagation. The gradient term you see is in fact a vector of partial derivatives where each element is the partial derivative of the log-likelihood with respect to each element of the parameter vector $\theta$. Therefore, it has the same dimensionality as $\theta$. Each element of the parameter vector is then updated with the respective term in the vector of partial derivatives, which are generally not the same.

nbro
  • 39,006
  • 12
  • 98
  • 176
cantordust
  • 943
  • 6
  • 10