Achieving Sub-Linear Regret in Infinite Horizon Average Reward Constrained MDP with Linear Function Approximation

Abstract

We study the infinite horizon average reward constrained Markov Decision Process (CMDP). In contrast to existing works on model-based, finite state space, we consider the model-free linear CMDP setup. We first propose a computationally inefficient algorithm and show that $\tilde{\mathcal{O}}(\sqrt{d^3T})$ regret and constraint violation can be achieved, in which $T$ is the number of interactions, and $d$ is the dimension of the feature mapping. We also propose an efficient variant based on the primal-dual adaptation of the LSVI-UCB algorithm and show that $\tilde{\mathcal{O}}((dT)^{3/4})$ regret and constraint violation can be achieved. This improves the known regret bound of $\tilde{\mathcal{O}}(T^{5/6})$ for the finite state-space model-free constrained RL which was obtained under a stronger assumption compared to ours. We also develop an efficient policy-based algorithm via novel adaptation of the MDP-EXP2 algorithm to our primal-dual set up with $\tilde{\mathcal{O}}(\sqrt{T})$ regret and even zero constraint violation bound under a stronger set of assumptions.

Cite

Text

Ghosh et al. "Achieving Sub-Linear Regret in Infinite Horizon Average Reward Constrained MDP with Linear Function Approximation." International Conference on Learning Representations, 2023.

Markdown

[Ghosh et al. "Achieving Sub-Linear Regret in Infinite Horizon Average Reward Constrained MDP with Linear Function Approximation." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/ghosh2023iclr-achieving/)

BibTeX

@inproceedings{ghosh2023iclr-achieving,
  title     = {{Achieving Sub-Linear Regret in Infinite Horizon Average Reward Constrained MDP with Linear Function Approximation}},
  author    = {Ghosh, Arnob and Zhou, Xingyu and Shroff, Ness},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/ghosh2023iclr-achieving/}
}