0

I'm trying to implement the paper A Curiosity Algorithm for Robots Based on the Free Energy Principle in a reinforcement learning environment using PyTorch, but I am unclear how curiosity is calculated.

As the paper describes, I made a 'transitioner' which predicts the next state given the current state and action. One layer of that transitioner is a Bayesian linear layer from Blitz; the probability distribution of weights in that layer is $q_ψ(w) = N(w|μ,σ_2)$ where $ψ = \{μ,σ\}$. Then, curiosity is defined as the Kullback–Leibler divergence $D_{KL}[q_ψ(w|s_{t+1})||q_ψ(w)]$.

$q_ψ(w)$ is easily available, but $q_ψ(w|s_{t+1})$ is described as value of $q$ minimizing free energy $F = D_{KL}[q_ψ(w)||p_ψ(w)] − E_{q_ψ}(w)[log (p_ψ(s_{t+1}|w))]$, and $p_ψ$ isn't given. I think I understand the concept, but I'm puzzled how to implement this.

  • To update: I am learning much by focusing on the paper inspiring Blitz: "Weight Uncertainty in Neural Networks" (also cited in "A Curiosity Algorithm for Robots Based on the Free Energy Principle"). It seems the loss-function for weight probability distributions is, in fact, the variational free energy. Hence, DKL[qψ(w|st+1)||qψ(w)] literally describes comparing weights before and after training. – Ted Tinker Nov 20 '22 at 02:24
  • You can use mathjax on this site. – nbro Dec 11 '22 at 13:06
  • Oh, thank you. I'll try editing the question. – Ted Tinker Dec 12 '22 at 05:33

0 Answers0