Model-Based Offline Meta-Reinforcement Learning with Regularization
Abstract
Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these challenges, aiming to learn an informative meta-policy from a collection of tasks. Nevertheless, as shown in our empirical studies, offline Meta-RL could be outperformed by offline single-task RL methods on tasks with good quality of datasets, indicating that a right balance has to be delicately calibrated between "exploring" the out-of-distribution state-actions by following the meta-policy and "exploiting" the offline dataset by staying close to the behavior policy. Motivated by such empirical analysis, we propose model-based offline $\text{\bf Me}$ta-RL with $\text{\bf r}$egularized $\text{\bf P}$olicy $\text{\bf O}$ptimization (MerPO), which learns a meta-model for efficient task structure inference and an informative meta-policy for safe exploration of out-of-distribution state-actions. In particular, we devise a new meta-Regularized model-based Actor-Critic (RAC) method for within-task policy optimization, as a key building block of MerPO, using both conservative policy evaluation and regularized policy improvement; and the intrinsic tradeoff therein is achieved via striking the right balance between two regularizers, one based on the behavior policy and the other on the meta-policy. We theoretically show that the learnt policy offers guaranteed improvement over both the behavior policy and the meta-policy, thus ensuring the performance improvement on new tasks via offline Meta-RL. Our experiments corroborate the superior performance of MerPO over existing offline Meta-RL methods.
Cite
Text
Lin et al. "Model-Based Offline Meta-Reinforcement Learning with Regularization." International Conference on Learning Representations, 2022.Markdown
[Lin et al. "Model-Based Offline Meta-Reinforcement Learning with Regularization." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/lin2022iclr-modelbased/)BibTeX
@inproceedings{lin2022iclr-modelbased,
title = {{Model-Based Offline Meta-Reinforcement Learning with Regularization}},
author = {Lin, Sen and Wan, Jialin and Xu, Tengyu and Liang, Yingbin and Zhang, Junshan},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/lin2022iclr-modelbased/}
}