I am confused as to when to hold certain parameters constant in a VAE. I will explain with a concrete example.
We can write $\operatorname{ELBO}(\phi, \theta) = \mathbb{E}_{q_{\phi}(z)}\left[\log \left(p_{\theta}(x| z)\right)\right] - D_{\operatorname{KL}}[q_{\phi}(z) | p(z)]$, where we wish to find $\nabla_{\phi, \theta}\operatorname{ELBO}(\phi, \theta)$. We can take the gradient of the KL divergence quite easily since it can be analytically solved.
My issue is with the graident $\nabla_{\phi, \theta}\mathbb{E}_{q_{\phi}(z)}\left[\log \left(p_{\theta}(x| z)\right)\right]$. I am assuming that the expectation is intractable and therefore we can use a Monte Carlo (MC) approximation and instead find $$ \nabla_{\phi, \theta}\left(\frac{1}{L}\sum_{l=1}^{L}\log p_{\theta}\left(x|z^{(i)}\right)\right) $$ where we use $L$ samples for the MC approximation. From my understanding, the gradient of this term w.r.t $\phi$ should be non-zero since changing $\phi$ should change $z$, which would change the above term. However, when I look at a derivations for gradient estimators such as the Score Function Estimator, I see that $\theta$ is treated as a constant w.r.t $\phi$. Example from Appendix B of the linked paper:
$$ \nabla_{\phi}\sum_{h}Q_{\phi}(h|x)\log P_{\theta}(x, h) \implies \sum_{h}\log P_{\theta}(x, h)\nabla_{\phi}Q_{\phi}(h|x) $$ I am not sure how to connect the two differences here. One potential explanation is that the latent $h$ is being treated fixed in the above equation; however, I don't see why this should be the case since $h$ is a function of the parameters $\phi$ and so changing $\phi$ would in turn change $h$ and thus the value of $\log P_{\theta}(x, h)$?