Can I apply reparametrization trick on "any" deep neural network?

Question

I came across the "reparametrization trick" for the first time in the following paragraph from the chapter named Vector Calculus from the test book titled Mathematics for Machine Learning by Marc Peter Deisenroth et al.

The Jacobian determinant and variable transformations will become relevant ... when we transform random variables and probability distributions. These transformations are extremely relevant in machine learning in the context of training deep neural networks using the reparametrization trick, also called infinite perturbation analysis.

The trick has been used in the context of neural networks training in the quoted paragraph. But when I search about the reparametrization trick, I found it only or widely in training autoencoders.

In the context of training a traditional deep neural network, is the trick useful?

score 2 · Answer 1 · edited Nov 19 '21 at 11:36

2

The reparameterization trick (also known as the pathwise derivative or infinitesimal perturbation analysis) is a method for calculating the gradient of a function of a random variable. It is used, for example, in variational autoencoders or deterministic policy gradient algorithms.

If you plan on working with models that involve random variables, you definitely need to understand what the reparameterization trick is.

You will also need to understand the other method to calculate gradients for functions of random variables, which is known as the likelihood ratio (also known as the score function or the REINFORCE gradient).

If your definition of a "traditional" neural network does not involve random variables, then such a method is irrelevant.

edited Nov 19 '21 at 11:36

nbro

39,006
12
98
176

answered Nov 17 '21 at 10:45

Taw

1,161
3
10

Note that the re-parametrization trick is used to reduce the variance (i.e. it's a _variance reduction technique_). The authors of the VAE state this in the VAE paper (if I remember correctly). So, is the re-parametrization really needed or only needed to reduce the variance? Moreover, are you sure that "REINFORCE gradient" is generally a synonym for "likelihood ratio"? Maybe in the context of reinforcement learning and the REINFORCE algorithm, but isn't the likelihood ratio something more general? – nbro Nov 19 '21 at 11:34
Yes, REINFORCE gradient definitely means the same thing as score function or likelihood ratio (at least when the latter two are talking about gradient estimation). – Taw Nov 19 '21 at 16:22
The reparameterization trick has lower variance than likelihood ratios. I guess you could call it a variance reduction technique in this sense (assuming that either method can be applied), but typically variance reduction is referring to the use of some sort of control variate. – Taw Nov 19 '21 at 16:27
I agree with @nbro, REINFORCE gradient is not synonymous with the likelihood ratio. The likelihood ratio is something extremely general from statistics that has been around much longer than REINFORCE. – David Dec 19 '21 at 12:40
https://arxiv.org/abs/1506.05254 page 3, nomenclature, "What we call the score function estimator (via [3]) is alternatively called the likelihood ratio estimator [5] and REINFORCE [26]." – Taw Dec 19 '21 at 15:47
@Taw The point is: [REINFORCE is a **reinforcement learning** algorithm](http://incompleteideas.net/book/RLbook2020.pdf#page=348), while the term "score function" is used more generally. Do you understand what we want to say? The score function appears e.g. in variational inference too. Saying that REINFORCE is the score function is similar to saying that gradient descent is the chain rule. No. Gradient descent is an algorithm that uses a specific version of the chain rule applied to neural nets. So, REINFORCE is an algorithm, it's not a score function. It uses a score function. – nbro Dec 19 '21 at 16:42
See [my answer](https://ai.stackexchange.com/a/33829/2444) here, for instance, and the linked paper about Monte Carlo gradient estimation. – nbro Dec 19 '21 at 16:48

score 0 · Answer 2 · answered Dec 19 '21 at 17:09

Yes, the reparametrization trick can be useful in the context of variational Bayesian neural networks, although other more effective variance reduction techniques are more commonly used (in particular, the flipout estimator). See this implementation of BNNs that uses Flipout, but TensorFlow Probability, the library used to implement that example, also provides layers that implement the reparametrization trick.

Note that the reparametrization trick is used in the context of variational auto-encoders (VAEs) (so not in the context of deterministic auto-encoders). VAEs and BNNs have a lot in common: both are based on stochastic variational inference (i.e. variational inference combined with stochastic gradient descent). So, whenever you have some sampling or some stochastic operation, the reparametrization trick could turn out to be useful. However, right now, I am only familiar with these two types of models that use it.

Can I apply reparametrization trick on "any" deep neural network?

2 Answers2