No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

Abstract

Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms. Estimation of meta-gradients is central to the performance of these meta-algorithms, and has been studied in the setting of MAML-style short-horizon meta-RL problems. In this context, prior work has investigated the estimation of the Hessian of the RL objective, as well as tackling the problem of credit assignment to pre-adaptation behavior by making a sampling correction. However, we show that Hessian estimation, implemented for example by DiCE and its variants, always add bias and can also add variance to meta-gradient estimation. DiCE-like approaches are therefore unlikely to lie on Pareto frontier of the bias-variance tradeoff and should not be pursued in the context of meta-gradients for RL. Meanwhile, the sampling correction has not been studied in the important long-horizon setting, where the inner optimization trajectories must be truncated for computational tractability. We study the bias and variance tradeoff induced by truncated backpropagation in combination with a weighted sampling correction. While prior work has implicitly chosen points in this bias-variance space, we disentangle the sources of bias and variance and present an empirical study which relates existing estimators to each other.

Cite

Text

Vuorio et al. "No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients." NeurIPS 2021 Workshops: DeepRL, 2021.

Markdown

[Vuorio et al. "No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/vuorio2021neuripsw-dice/)

BibTeX

@inproceedings{vuorio2021neuripsw-dice,
  title     = {{No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients}},
  author    = {Vuorio, Risto and Beck, Jacob Austin and Farquhar, Gregory and Foerster, Jakob Nicolaus and Whiteson, Shimon},
  booktitle = {NeurIPS 2021 Workshops: DeepRL},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/vuorio2021neuripsw-dice/}
}