One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning

Abstract

Self-tuning algorithms that adapt the learning process online encourage more effective and robust learning. Among all the methods available, meta-gradients have emerged as a promising approach. They leverage the differentiability of the learning rule with respect to some hyper-parameters to adapt them in an online fashion. Although meta-gradients can be accumulated over multiple learning steps to avoid myopic updates, this is rarely used in practice. In this work, we demonstrate that whilst multi-step meta-gradients do provide a better learning signal in expectation, this comes at the cost of a significant increase in variance, hindering performance. In the light of this analysis, we introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal, essentially trading off bias and variance in meta-gradient estimation. When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.

Cite

Text

Bonnet et al. "One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning." NeurIPS 2021 Workshops: MetaLearn, 2021.

Markdown

[Bonnet et al. "One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning." NeurIPS 2021 Workshops: MetaLearn, 2021.](https://mlanthology.org/neuripsw/2021/bonnet2021neuripsw-one/)

BibTeX

@inproceedings{bonnet2021neuripsw-one,
  title     = {{One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning}},
  author    = {Bonnet, Clément and Caron, Paul and Barrett, Thomas D and Davies, Ian and Laterre, Alexandre},
  booktitle = {NeurIPS 2021 Workshops: MetaLearn},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/bonnet2021neuripsw-one/}
}