Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Kitamura, Toshinori; Kozuno, Tadashi; Kumagai, Wataru; Hoshino, Kenta; Hosoe, Yohei; Kasaura, Kazumi; Hamaya, Masashi; Parmas, Paavo; Matsuo, Yutaka

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo

ICLR 2025

/iclr/2025/kitamura2025iclr-nearoptimal/

Abstract

Designing a safe policy for uncertain environments is crucial in real-world control systems. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm guaranteed to identify a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional policy gradient approach to the Lagrangian max-min formulation can become trapped in suboptimal solutions. This occurs when its inner minimization encounters a sum of conflicting gradients from the objective and constraint functions. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the constraints. Building on the epigraph form, we propose a bisection search algorithm with a policy gradient subroutine and prove that it identifies an $\varepsilon$-optimal policy in an RCMDP with $\widetilde{\mathcal{O}}(\varepsilon^{-4})$ robust policy evaluations.

PDF ICLR Semantic Scholar

Cite

Text

Kitamura et al. "Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form." International Conference on Learning Representations, 2025.

Markdown

[Kitamura et al. "Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/kitamura2025iclr-nearoptimal/)

BibTeX

@inproceedings{kitamura2025iclr-nearoptimal,
  title     = {{Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form}},
  author    = {Kitamura, Toshinori and Kozuno, Tadashi and Kumagai, Wataru and Hoshino, Kenta and Hosoe, Yohei and Kasaura, Kazumi and Hamaya, Masashi and Parmas, Paavo and Matsuo, Yutaka},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/kitamura2025iclr-nearoptimal/}
}