Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games
Abstract
Simultaneous gradient updates are widely used in multi-agent learning. However, this method introduces non-stationarity from the perspective of each agent due to the co-evolution of other agents' policies. To address this issue, we consider best-response dynamics, where only one agent updates its policy at a time. We theoretically show that with best-response dynamics, convergence results from single-agent reinforcement learning extend to Markov potential games (MPGs). Moreover, building on the concept of price of anarchy and smoothness from normal-form games, we aim to find policies in MPGs that achieve optimal cooperation and provide the first known suboptimality guarantees for policy gradient variants under the best-response dynamics. Empirical results demonstrate that the best-response dynamics significantly improves cooperation across policy gradient variants in classic and more complex games.
Cite
Text
Chen et al. "Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games." Transactions on Machine Learning Research, 2026.Markdown
[Chen et al. "Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/chen2026tmlr-analyzing/)BibTeX
@article{chen2026tmlr-analyzing,
title = {{Analyzing Best-Response Dynamics for Cooperation in Markov Potential Games}},
author = {Chen, Dingyang and Zeng, Xiaoling and Doan, Thinh T. and Zhang, Qi},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/chen2026tmlr-analyzing/}
}