Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Abstract

Designing a safe policy for uncertain environments is crucial in real-world control systems. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm guaranteed to identify a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional policy gradient approach to the Lagrangian max-min formulation can become trapped in suboptimal solutions. This occurs when its inner minimization encounters a sum of conflicting gradients from the objective and constraint functions. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the constraints. Building on the epigraph form, we propose a bisection search algorithm with a policy gradient subroutine and prove that it identifies an $\varepsilon$-optimal policy in an RCMDP with $\widetilde{\mathcal{O}}(\varepsilon^{-4})$ robust policy evaluations.

Cite

Text

Kitamura et al. "Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form." International Conference on Learning Representations, 2025.

Markdown

[Kitamura et al. "Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/kitamura2025iclr-nearoptimal/)

BibTeX

@inproceedings{kitamura2025iclr-nearoptimal,
  title     = {{Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form}},
  author    = {Kitamura, Toshinori and Kozuno, Tadashi and Kumagai, Wataru and Hoshino, Kenta and Hosoe, Yohei and Kasaura, Kazumi and Hamaya, Masashi and Parmas, Paavo and Matsuo, Yutaka},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/kitamura2025iclr-nearoptimal/}
}