Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning
Abstract
Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice. Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, we investigate such a discrepancy. In particular, (1) We show that unbiased gradient estimates have variance $\Theta(N)$ which linearly depends on the sample size $N$ of the inner loop updates; (2) We propose linearized score function (LSF) gradient estimates, which have bias $\mathcal{O}(1/\sqrt{N})$ and variance $\mathcal{O}(1/N)$; (3) We show that most empirical prior work in fact implements variants of the LSF gradient estimates. This implies that practical algorithms "accidentally" introduce bias to achieve better performance; (4) We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its convergence to stationary points, showing better dependency on $N$ than prior work when $N$ is large.
Cite
Text
Tang. "Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning." International Conference on Machine Learning, 2022.Markdown
[Tang. "Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/tang2022icml-biased/)BibTeX
@inproceedings{tang2022icml-biased,
title = {{Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning}},
author = {Tang, Yunhao},
booktitle = {International Conference on Machine Learning},
year = {2022},
pages = {21050-21075},
volume = {162},
url = {https://mlanthology.org/icml/2022/tang2022icml-biased/}
}