Meta-Model-Based Meta-Policy Optimization

Abstract

Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

Cite

Text

Hiraoka et al. "Meta-Model-Based Meta-Policy Optimization." Proceedings of The 13th Asian Conference on Machine Learning, 2021.

Markdown

[Hiraoka et al. "Meta-Model-Based Meta-Policy Optimization." Proceedings of The 13th Asian Conference on Machine Learning, 2021.](https://mlanthology.org/acml/2021/hiraoka2021acml-metamodelbased/)

BibTeX

@inproceedings{hiraoka2021acml-metamodelbased,
  title     = {{Meta-Model-Based Meta-Policy Optimization}},
  author    = {Hiraoka, Takuya and Imagawa, Takahisa and Tangkaratt, Voot and Osa, Takayuki and Onishi, Takashi and Tsuruoka, Yoshimasa},
  booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
  year      = {2021},
  pages     = {129-144},
  volume    = {157},
  url       = {https://mlanthology.org/acml/2021/hiraoka2021acml-metamodelbased/}
}