A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Abstract

This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering a learning horizon K, which is sufficiently large, the proposed algorithm achieves sublinear regret and zero constraint violation. The bounds depend on the number of states S, the number of actions A, and two constants which are independent of the learning horizon K.

Cite

Text

Wei et al. "A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I4.20302

Markdown

[Wei et al. "A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/wei2022aaai-provably/) doi:10.1609/AAAI.V36I4.20302

BibTeX

@inproceedings{wei2022aaai-provably,
  title     = {{A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes}},
  author    = {Wei, Honghao and Liu, Xin and Ying, Lei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {3868-3876},
  doi       = {10.1609/AAAI.V36I4.20302},
  url       = {https://mlanthology.org/aaai/2022/wei2022aaai-provably/}
}