I'm trying to implement the paper A Curiosity Algorithm for Robots Based on the Free Energy Principle in a reinforcement learning environment using PyTorch, but I am unclear how curiosity is calculated.
As the paper describes, I made a 'transitioner' which predicts the next state given the current state and action. One layer of that transitioner is a Bayesian linear layer from Blitz; the probability distribution of weights in that layer is $q_ψ(w) = N(w|μ,σ_2)$ where $ψ = \{μ,σ\}$. Then, curiosity is defined as the Kullback–Leibler divergence $D_{KL}[q_ψ(w|s_{t+1})||q_ψ(w)]$.
$q_ψ(w)$ is easily available, but $q_ψ(w|s_{t+1})$ is described as value of $q$ minimizing free energy $F = D_{KL}[q_ψ(w)||p_ψ(w)] − E_{q_ψ}(w)[log (p_ψ(s_{t+1}|w))]$, and $p_ψ$ isn't given. I think I understand the concept, but I'm puzzled how to implement this.