A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

Abstract

We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $\mathcal{O}(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $\mathcal{O}(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $\mathcal{O}(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

Cite

Text

Hong and Tewari. "A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs." International Conference on Machine Learning, 2024.

Markdown

[Hong and Tewari. "A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/hong2024icml-primaldual/)

BibTeX

@inproceedings{hong2024icml-primaldual,
  title     = {{A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs}},
  author    = {Hong, Kihyuk and Tewari, Ambuj},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {18711-18737},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/hong2024icml-primaldual/}
}