Offline Reinforcement Learning with On-Policy Q-Function Regularization

Shi, Laixi; Dadashi, Robert; Chi, Yuejie; Castro, Pablo Samuel; Geist, Matthieu

doi:10.1007/978-3-031-43421-1_27

Offline Reinforcement Learning with On-Policy Q-Function Regularization

Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist

ECML-PKDD 2023 pp. 455-471

doi:10.1007/978-3-031-43421-1_27 /ecmlpkdd/2023/shi2023ecmlpkdd-offline/

Abstract

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Shi et al. "Offline Reinforcement Learning with On-Policy Q-Function Regularization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_27

Markdown

[Shi et al. "Offline Reinforcement Learning with On-Policy Q-Function Regularization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/shi2023ecmlpkdd-offline/) doi:10.1007/978-3-031-43421-1_27

BibTeX

@inproceedings{shi2023ecmlpkdd-offline,
  title     = {{Offline Reinforcement Learning with On-Policy Q-Function Regularization}},
  author    = {Shi, Laixi and Dadashi, Robert and Chi, Yuejie and Castro, Pablo Samuel and Geist, Matthieu},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {455-471},
  doi       = {10.1007/978-3-031-43421-1_27},
  url       = {https://mlanthology.org/ecmlpkdd/2023/shi2023ecmlpkdd-offline/}
}