COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
Abstract
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose $\texttt{COPlanner}$, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration. $\texttt{COPlanner}$ leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus during real environment exploration respectively, to choose actions. Consequently, $\texttt{COPlanner}$ can avoid model uncertain regions through conservative model rollouts, thereby alleviating the influence of model error. Simultaneously, it explores high-reward model uncertain regions to reduce model error actively through optimistic real environment exploration. $\texttt{COPlanner}$ is a plug-and-play framework that can be applied to any dyna-style model-based methods. Experimental results on a series of proprioceptive and visual continuous control tasks demonstrate that both sample efficiency and asymptotic performance of strong model-based methods are significantly improved combined with $\texttt{COPlanner}$.
Cite
Text
Wang et al. "COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL." NeurIPS 2023 Workshops: GenPlan, 2023.Markdown
[Wang et al. "COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL." NeurIPS 2023 Workshops: GenPlan, 2023.](https://mlanthology.org/neuripsw/2023/wang2023neuripsw-coplanner/)BibTeX
@inproceedings{wang2023neuripsw-coplanner,
title = {{COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL}},
author = {Wang, Xiyao and Zheng, Ruijie and Sun, Yanchao and Jia, Ruonan and Wongkamjan, Wichayaporn and Xu, Huazhe and Huang, Furong},
booktitle = {NeurIPS 2023 Workshops: GenPlan},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/wang2023neuripsw-coplanner/}
}