Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms.

Abstract

Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$) of the performance (Lagrange) function $L(\theta,\gamma)$, with a sample complexity of $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms. We also show the results of experiments on three different Safety-Gym environments.

Cite

Text

Panda and Bhatnagar. "Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms.." Uncertainty in Artificial Intelligence, 2024.

Markdown

[Panda and Bhatnagar. "Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms.." Uncertainty in Artificial Intelligence, 2024.](https://mlanthology.org/uai/2024/panda2024uai-finitetime/)

BibTeX

@inproceedings{panda2024uai-finitetime,
  title     = {{Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms.}},
  author    = {Panda, Prashansa and Bhatnagar, Shalabh},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2024},
  pages     = {2787-2834},
  volume    = {244},
  url       = {https://mlanthology.org/uai/2024/panda2024uai-finitetime/}
}