Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

Abstract

Model-based reinforcement learning (RL) achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a ``global'' dynamics model to fit the state-action visitation distribution for all historical policies. However, in this paper, we find that learning a global dynamics model does not necessarily benefit model prediction for the current policy since the policy in use is constantly evolving. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how the distribution of historical policies affects the model learning and model rollouts. We then propose a novel model-based RL method, named \textit{Policy-adaptation Model-based Actor-Critic (PMAC)}, which learns a policy-adapted dynamics model based on a policy-adaptation mechanism. This mechanism dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PMAC achieves state-of-the-art asymptotic performance and almost two times higher sample efficiency than prior model-based methods.

Cite

Text

Wang et al. "Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy." ICML 2022 Workshops: DARL, 2022.

Markdown

[Wang et al. "Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy." ICML 2022 Workshops: DARL, 2022.](https://mlanthology.org/icmlw/2022/wang2022icmlw-live/)

BibTeX

@inproceedings{wang2022icmlw-live,
  title     = {{Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy}},
  author    = {Wang, Xiyao and Wongkamjan, Wichayaporn and Huang, Furong},
  booktitle = {ICML 2022 Workshops: DARL},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/wang2022icmlw-live/}
}