Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity

Wang, Qiuhao; Zha, Yuqi; Ho, Chin Pang; Petrik, Marek

Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity

Qiuhao Wang, Yuqi Zha, Chin Pang Ho, Marek Petrik

ICML 2025 pp. 65368-65399

/icml/2025/wang2025icml-provable-a/

Abstract

Robust Markov Decision Processes (MDPs) offer a promising framework for computing reliable policies under model uncertainty. While policy gradient methods have gained increasing popularity in robust discounted MDPs, their application to the average-reward criterion remains largely unexplored. This paper proposes a Robust Projected Policy Gradient (RP2G), the first generic policy gradient method for robust average-reward MDPs (RAMDPs) that is applicable beyond the typical rectangularity assumption on transition ambiguity. In contrast to existing robust policy gradient algorithms, RP2G incorporates an adaptive decreasing tolerance mechanism for efficient policy updates at each iteration. We also present a comprehensive convergence analysis of RP2G for solving ergodic tabular RAMDPs. Furthermore, we establish the first study of the inner worst-case transition evaluation problem in RAMDPs, proposing two gradient-based algorithms tailored for rectangular and general ambiguity sets, each with provable convergence guarantees. Numerical experiments confirm the global convergence of our new algorithm and demonstrate its superior performance.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Wang et al. "Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Wang et al. "Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/wang2025icml-provable-a/)

BibTeX

@inproceedings{wang2025icml-provable-a,
  title     = {{Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity}},
  author    = {Wang, Qiuhao and Zha, Yuqi and Ho, Chin Pang and Petrik, Marek},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {65368-65399},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/wang2025icml-provable-a/}
}