Policy-Regret Minimization in Markov Games with Function Approximation

Abstract

We study policy-regret minimization problem in dynamically evolving environments, modeled as Markov games between a learner and a strategic, adaptive opponent. We propose a general algorithmic framework that achieves the optimal $\mathcal{O}(\sqrt{T})$ policy regret for a wide class of large-scale problems characterized by an Eluder-type condition–extending beyond the tabular settings of previous work. Importantly, our framework uncovers a simpler yet powerful algorithmic approach for handling reactive adversaries, demonstrating that leveraging opponent learning in such settings is key to attaining the optimal $\mathcal{O}(\sqrt{T})$ policy regret.

Cite

Text

Nguyen-Tang and Arora. "Policy-Regret Minimization in Markov Games with Function Approximation." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Nguyen-Tang and Arora. "Policy-Regret Minimization in Markov Games with Function Approximation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/nguyentang2025icml-policyregret/)

BibTeX

@inproceedings{nguyentang2025icml-policyregret,
  title     = {{Policy-Regret Minimization in Markov Games with Function Approximation}},
  author    = {Nguyen-Tang, Thanh and Arora, Raman},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {46242-46264},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/nguyentang2025icml-policyregret/}
}