Implicit Reparameterization Gradients

Backpropagation through a stochastic node is an important problem in deep learning. The optimization of requires computation of . Stochastic variational inference requires the computation of the gradient of one such expectation.

Earlier methods of gradient computation include score-function-based estimators (REINFORCE) and pathwise gradient estimators (reparameterization trick). Recent works have proposed using reparametrizable surrograte distributions such as Gumbel-Softmax for Categorical, Kumaraswamy for Beta, etc. Other recent works such as Generalized Reparameterization Gradients (GRG) and Rejection Sampling Variational Inference (RSVI) have sought to build a generalized framework for gradient computation.

Explicit Reparameterization

It requires a standardization function such that . It also requires to be invertible. and .

Implicit Reparameterization

Implicit Reparameterization eliminates the restrictive requirement of an invertible .

Implicit Reparameterization
Figure 1.

where Eq. (1) uses the fact that the total derivative of noise with respect to the distribution parameters is 0 and Eq. (2) applies the multivariate chain rule based on Figure 1.


Normal Distribution

The standardization function for the normal distribution is .

  • Explicit Reparameterization: and .
  • Implicit Reparameterization: and .

Using Cumulative Distribution Function

The CDF can be used as a standardization function by using the property that for a random variable , the random variable has the uniform distribution on where is the CDF. The gradient can then be computed as follows.


Implicit Reparameterization allows stochastic backpropagation through a variety of distributions such as truncated, mixtures, gamma, Von-Mises, Beta, etc. Check out these slides and the paper.


[1] Figurnov, M., Mohamed, S. and Mnih, A., 2018. Implicit Reparameterization Gradients. arXiv preprint arXiv:1805.08498.
[2] Jang, E., Gu, S. and Poole, B., 2016. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
[3] Kingma, D.P. and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[4] Naesseth, C.A., Ruiz, F.J., Linderman, S.W. and Blei, D.M., 2016. Reparameterization gradients through acceptance-rejection sampling algorithms. arXiv preprint arXiv:1610.05683.
[5] Ruiz, F.R., AUEB, M.T.R. and Blei, D., 2016. The generalized reparameterization gradient. In Advances in neural information processing systems (pp. 460-468).

Written on September 21, 2018