Policy-Regret Minimization in Markov Games with Function Approximation
Abstract
We study policy-regret minimization problem in dynamically evolving environments, modeled as Markov games between a learner and a strategic, adaptive opponent. We propose a general algorithmic framework that achieves the optimal $\mathcal{O}(\sqrt{T})$ policy regret for a wide class of large-scale problems characterized by an Eluder-type condition–extending beyond the tabular settings of previous work. Importantly, our framework uncovers a simpler yet powerful algorithmic approach for handling reactive adversaries, demonstrating that leveraging opponent learning in such settings is key to attaining the optimal $\mathcal{O}(\sqrt{T})$ policy regret.
Cite
Text
Nguyen-Tang and Arora. "Policy-Regret Minimization in Markov Games with Function Approximation." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Nguyen-Tang and Arora. "Policy-Regret Minimization in Markov Games with Function Approximation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/nguyentang2025icml-policyregret/)BibTeX
@inproceedings{nguyentang2025icml-policyregret,
title = {{Policy-Regret Minimization in Markov Games with Function Approximation}},
author = {Nguyen-Tang, Thanh and Arora, Raman},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {46242-46264},
volume = {267},
url = {https://mlanthology.org/icml/2025/nguyentang2025icml-policyregret/}
}