Average-Constrained Policy Optimization

Abstract

Reinforcement Learning (RL) with constraints is becoming an increasingly important problem for various applications. Often, the average criterion is more suitable than the discounted criterion. Yet, RL for average criterion-constrained MDPs remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion. We develop basic sensitivity theory for average MDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging MuJoCo environments, show the superior performance of the algorithm when compared to other state-of-the-art algorithms adapted for the average CMDP setting.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Agnihotri et al. "Average-Constrained Policy Optimization." NeurIPS 2023 Workshops: OPT, 2023.

Markdown

[Agnihotri et al. "Average-Constrained Policy Optimization." NeurIPS 2023 Workshops: OPT, 2023.](https://mlanthology.org/neuripsw/2023/agnihotri2023neuripsw-averageconstrained/)

BibTeX

@inproceedings{agnihotri2023neuripsw-averageconstrained,
  title     = {{Average-Constrained Policy Optimization}},
  author    = {Agnihotri, Akhil and Jain, Rahul and Luo, Haipeng},
  booktitle = {NeurIPS 2023 Workshops: OPT},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/agnihotri2023neuripsw-averageconstrained/}
}