2

I am attempting to fully understand the explicit derivation and computation of the Hessian and how it is used in MAML. I came across this blog: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html.

Specifically, could someone help to clarify this for me: is this term in the red box literally interpreted as the gradient at $\theta_{k-1}$ multiplied by the $\theta_k$?

enter image description here

nbro
  • 39,006
  • 12
  • 98
  • 176
Blake Camp
  • 23
  • 2

1 Answers1

0

$\nabla_{\theta_{k-1}} \theta_k$ is gradient of $\theta_k$ with respect to $\theta_{k-1}$, it follows chain rule as noted in the side comment in the image. $\nabla_{\theta} \mathcal L(\theta_k)$ is also not a Hessian but a gradient vector.

Brale
  • 2,306
  • 1
  • 5
  • 14