Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games

Abstract

Simultaneous gradient updates are widely used in multi-agent learning. However, this method introduces non-stationarity from the perspective of each agent due to the co-evolution of other agents' policies. To address this issue, we consider best-response dynamics, where only one agent updates its policy at a time. We theoretically show that with best-response dynamics, convergence results from single-agent reinforcement learning extend to Markov potential games (MPGs). Moreover, building on the concept of price of anarchy and smoothness from normal-form games, we aim to find policies in MPGs that achieve optimal cooperation and provide the first known suboptimality guarantees for policy gradient variants under the best-response dynamics. Empirical results demonstrate that the best-response dynamics significantly improves cooperation across policy gradient variants in classic and more complex games.

Cite

Text

Chen et al. "Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games." Transactions on Machine Learning Research, 2026.

Markdown

[Chen et al. "Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/chen2026tmlr-analyzing/)

BibTeX

@article{chen2026tmlr-analyzing,
  title     = {{Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games}},
  author    = {Chen, Dingyang and Zeng, Xiaoling and Doan, Thinh T. and Zhang, Qi},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/chen2026tmlr-analyzing/}
}