Model-Based Offline Reinforcement Learning with Count-Based Conservatism

Abstract

In this paper, we present a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at https://github.com/oh-lab/Count-MORL.

Cite

Text

Kim and Oh. "Model-Based Offline Reinforcement Learning with Count-Based Conservatism." International Conference on Machine Learning, 2023.

Markdown

[Kim and Oh. "Model-Based Offline Reinforcement Learning with Count-Based Conservatism." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/kim2023icml-modelbased/)

BibTeX

@inproceedings{kim2023icml-modelbased,
  title     = {{Model-Based Offline Reinforcement Learning with Count-Based Conservatism}},
  author    = {Kim, Byeongchan and Oh, Min-Hwan},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {16728-16746},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/kim2023icml-modelbased/}
}