Anytime-Competitive Reinforcement Learning with Policy Prior

Abstract

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.

Cite

Text

Yang et al. "Anytime-Competitive Reinforcement Learning with Policy Prior." Neural Information Processing Systems, 2023.

Markdown

[Yang et al. "Anytime-Competitive Reinforcement Learning with Policy Prior." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/yang2023neurips-anytimecompetitive/)

BibTeX

@inproceedings{yang2023neurips-anytimecompetitive,
  title     = {{Anytime-Competitive Reinforcement Learning with Policy Prior}},
  author    = {Yang, Jianyi and Li, Pengfei and Li, Tongxin and Wierman, Adam and Ren, Shaolei},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/yang2023neurips-anytimecompetitive/}
}