Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games

Chen, Dingyang; Zeng, Xiaoling; Doan, Thinh T.; Zhang, Qi

Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games

Dingyang Chen, Xiaoling Zeng, Thinh T. Doan, Qi Zhang

TMLR 2026

/tmlr/2026/chen2026tmlr-analyzing/

Abstract

Simultaneous gradient updates are widely used in multi-agent learning. However, this method introduces non-stationarity from the perspective of each agent due to the co-evolution of other agents' policies. To address this issue, we consider best-response dynamics, where only one agent updates its policy at a time. We theoretically show that with best-response dynamics, convergence results from single-agent reinforcement learning extend to Markov potential games (MPGs). Moreover, building on the concept of price of anarchy and smoothness from normal-form games, we aim to find policies in MPGs that achieve optimal cooperation and provide the first known suboptimality guarantees for policy gradient variants under the best-response dynamics. Empirical results demonstrate that the best-response dynamics significantly improves cooperation across policy gradient variants in classic and more complex games.

PDF TMLR OpenReview Semantic Scholar

Cite

Text

Chen et al. "Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games." Transactions on Machine Learning Research, 2026.

Markdown

[Chen et al. "Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/chen2026tmlr-analyzing/)

BibTeX

@article{chen2026tmlr-analyzing,
  title     = {{Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games}},
  author    = {Chen, Dingyang and Zeng, Xiaoling and Doan, Thinh T. and Zhang, Qi},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/chen2026tmlr-analyzing/}
}