Safe and Efficient: A Primal-Dual Method for Offline Convex CMDPs Under Partial Data Coverage
Abstract
Offline safe reinforcement learning (RL) aims to find an optimal policy using a pre-collected dataset when data collection is impractical or risky. We propose a novel linear programming (LP) based primal-dual algorithm for convex MDPs that incorporates ``uncertainty'' parameters to improve data efficiency while requiring only partial data coverage assumption. Our theoretical results achieve a sample complexity of $\mathcal{O}(1/(1-\gamma)\sqrt{n})$ under general function approximation, improving the current state-of-the-art by a factor of $1/(1-\gamma)$, where $n$ is the number of data samples in an offline dataset, and $\gamma$ is the discount factor. The numerical experiments validate our theoretical findings, demonstrating the practical efficacy of our approach in achieving improved safety and learning efficiency in safe offline settings.
Cite
Text
Zhang et al. "Safe and Efficient: A Primal-Dual Method for Offline Convex CMDPs Under Partial Data Coverage." Neural Information Processing Systems, 2024. doi:10.52202/079017-1079Markdown
[Zhang et al. "Safe and Efficient: A Primal-Dual Method for Offline Convex CMDPs Under Partial Data Coverage." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhang2024neurips-safe/) doi:10.52202/079017-1079BibTeX
@inproceedings{zhang2024neurips-safe,
title = {{Safe and Efficient: A Primal-Dual Method for Offline Convex CMDPs Under Partial Data Coverage}},
author = {Zhang, Haobo and Peng, Xiyue and Wei, Honghao and Liu, Xin},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1079},
url = {https://mlanthology.org/neurips/2024/zhang2024neurips-safe/}
}