Backpropagation Through the Void: Optimizing Control Variates for Black-Box Gradient Estimation

Abstract

Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables, based on gradients of a learned function. These estimators can be jointly trained with model parameters or policies, and are applicable in both discrete and continuous settings. We give unbiased, adaptive analogs of state-of-the-art reinforcement learning methods such as advantage actor-critic. We also demonstrate this framework for training discrete latent-variable models.

Cite

Text

Grathwohl et al. "Backpropagation Through the Void: Optimizing Control Variates for Black-Box Gradient Estimation." International Conference on Learning Representations, 2018.

Markdown

[Grathwohl et al. "Backpropagation Through the Void: Optimizing Control Variates for Black-Box Gradient Estimation." International Conference on Learning Representations, 2018.](https://mlanthology.org/iclr/2018/grathwohl2018iclr-backpropagation/)

BibTeX

@inproceedings{grathwohl2018iclr-backpropagation,
  title     = {{Backpropagation Through the Void: Optimizing Control Variates for Black-Box Gradient Estimation}},
  author    = {Grathwohl, Will and Choi, Dami and Wu, Yuhuai and Roeder, Geoff and Duvenaud, David},
  booktitle = {International Conference on Learning Representations},
  year      = {2018},
  url       = {https://mlanthology.org/iclr/2018/grathwohl2018iclr-backpropagation/}
}