Regret Guarantees for Model-Based Reinforcement Learning with Long-Term Average Constraints

Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

UAI 2022 pp. 22-31

/uai/2022/agarwal2022uai-regret/

Abstract

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with an ergodic Markov Decision Process. At every interaction, the agent obtains a reward and incurs $K$ costs. The agent aims to maximize the long-term average reward while simultaneously keeping the $K$ long-term average costs lower than a certain threshold. In this paper, we propose \NAM, a posterior sampling based algorithm using which the agent can learn optimal policies to interact with the CMDP. We show that with the assumption of slackness, characterized by $\kappa$, the optimization problem is feasible for the sampled MDPs. Further, for MDP with $S$ states, $A$ actions, and mixing time $T_M$, we prove that following \NAM{} algorithm, the agent can bound the regret of not accumulating rewards from an optimal policy by $\Tilde{O}(T_MS\sqrt{AT})$. Further, we show that the violations for any of the $K$ constraints is also bounded by $\Tilde{O}(T_MS\sqrt{AT})$. To the best of our knowledge, this is the first work that obtains a $\Tilde{O}(\sqrt{T})$ regret bounds for ergodic MDPs with long-term average constraints using a posterior sampling method.

PDF UAI OpenReview Semantic Scholar

Cite

Text

Agarwal et al. "Regret Guarantees for Model-Based Reinforcement Learning with Long-Term Average Constraints." Uncertainty in Artificial Intelligence, 2022.

Markdown

[Agarwal et al. "Regret Guarantees for Model-Based Reinforcement Learning with Long-Term Average Constraints." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/agarwal2022uai-regret/)

BibTeX

@inproceedings{agarwal2022uai-regret,
  title     = {{Regret Guarantees for Model-Based Reinforcement Learning with Long-Term Average Constraints}},
  author    = {Agarwal, Mridul and Bai, Qinbo and Aggarwal, Vaneet},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2022},
  pages     = {22-31},
  volume    = {180},
  url       = {https://mlanthology.org/uai/2022/agarwal2022uai-regret/}
}