Model-Based Offline Policy Optimization with Distribution Correcting Regularization

Abstract

Offline Reinforcement Learning (RL) aims at learning effective policies by leveraging previously collected datasets without further exploration in environments. Model-based algorithms, which first learn a dynamics model using the offline dataset and then conservatively learn a policy under the model, have demonstrated great potential in offline RL. Previous model-based algorithms typically penalize the rewards with the uncertainty of the dynamics model, which, however, is not necessarily consistent with the model error. Inspired by the lower bound on the return in the real dynamics, in this paper we present a model-based alternative called DROP for offline RL. In particular, DROP estimates the density ratio between model-rollouts distribution and offline data distribution via the DICE framework [ 45 ], and then regularizes the model-predicted rewards with the ratio for pessimistic policy learning. Extensive experiments show our DROP can achieve comparable or better performance compared to baselines on widely studied offline RL benchmarks.

Cite

Text

Shen et al. "Model-Based Offline Policy Optimization with Distribution Correcting Regularization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86486-6_11

Markdown

[Shen et al. "Model-Based Offline Policy Optimization with Distribution Correcting Regularization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/shen2021ecmlpkdd-modelbased/) doi:10.1007/978-3-030-86486-6_11

BibTeX

@inproceedings{shen2021ecmlpkdd-modelbased,
  title     = {{Model-Based Offline Policy Optimization with Distribution Correcting Regularization}},
  author    = {Shen, Jian and Chen, Mingcheng and Zhang, Zhicheng and Yang, Zhengyu and Zhang, Weinan and Yu, Yong},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {174-189},
  doi       = {10.1007/978-3-030-86486-6_11},
  url       = {https://mlanthology.org/ecmlpkdd/2021/shen2021ecmlpkdd-modelbased/}
}