Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity
Abstract
Robust Markov Decision Processes (MDPs) offer a promising framework for computing reliable policies under model uncertainty. While policy gradient methods have gained increasing popularity in robust discounted MDPs, their application to the average-reward criterion remains largely unexplored. This paper proposes a Robust Projected Policy Gradient (RP2G), the first generic policy gradient method for robust average-reward MDPs (RAMDPs) that is applicable beyond the typical rectangularity assumption on transition ambiguity. In contrast to existing robust policy gradient algorithms, RP2G incorporates an adaptive decreasing tolerance mechanism for efficient policy updates at each iteration. We also present a comprehensive convergence analysis of RP2G for solving ergodic tabular RAMDPs. Furthermore, we establish the first study of the inner worst-case transition evaluation problem in RAMDPs, proposing two gradient-based algorithms tailored for rectangular and general ambiguity sets, each with provable convergence guarantees. Numerical experiments confirm the global convergence of our new algorithm and demonstrate its superior performance.
Cite
Text
Wang et al. "Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Wang et al. "Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/wang2025icml-provable-a/)BibTeX
@inproceedings{wang2025icml-provable-a,
title = {{Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity}},
author = {Wang, Qiuhao and Zha, Yuqi and Ho, Chin Pang and Petrik, Marek},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {65368-65399},
volume = {267},
url = {https://mlanthology.org/icml/2025/wang2025icml-provable-a/}
}