Model-Based Reinforcement Learning via Meta-Policy Optimization
Abstract
Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance as model-free methods. We propose Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models. Using an ensemble of learned dynamic models, MB-MPO meta-learns a policy that can quickly adapt to any model in the ensemble with one policy gradient step. This steers the meta-policy towards internalizing consistent dynamics predictions among the ensemble while shifting the burden of behaving optimally w.r.t. the model discrepancies towards the adaptation step. Our experiments show that MB-MPO is more robust to model imperfections than previous model-based approaches. Finally, we demonstrate that our approach is able to match the asymptotic performance of model-free methods while requiring significantly less experience.
Cite
Text
Clavera et al. "Model-Based Reinforcement Learning via Meta-Policy Optimization." Conference on Robot Learning, 2018.Markdown
[Clavera et al. "Model-Based Reinforcement Learning via Meta-Policy Optimization." Conference on Robot Learning, 2018.](https://mlanthology.org/corl/2018/clavera2018corl-model/)BibTeX
@inproceedings{clavera2018corl-model,
title = {{Model-Based Reinforcement Learning via Meta-Policy Optimization}},
author = {Clavera, Ignasi and Rothfuss, Jonas and Schulman, John and Fujita, Yasuhiro and Asfour, Tamim and Abbeel, Pieter},
booktitle = {Conference on Robot Learning},
year = {2018},
pages = {617-629},
url = {https://mlanthology.org/corl/2018/clavera2018corl-model/}
}