Projection-Based Lyapunov Method for Fully Heterogeneous Weakly-Coupled MDPs

Abstract

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the _fully heterogeneous_ setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the _first asymptotic optimality result_ for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

Cite

Text

Zhang et al. "Projection-Based Lyapunov Method for Fully Heterogeneous Weakly-Coupled MDPs." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "Projection-Based Lyapunov Method for Fully Heterogeneous Weakly-Coupled MDPs." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-projectionbased/)

BibTeX

@inproceedings{zhang2025neurips-projectionbased,
  title     = {{Projection-Based Lyapunov Method for Fully Heterogeneous Weakly-Coupled MDPs}},
  author    = {Zhang, XiangCheng and Hong, Yige and Wang, Weina},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-projectionbased/}
}