Polynomial Convergence of Bandit No-Regret Dynamics in Congestion Games

Abstract

We present an online learning algorithm in the bandit feedback model that, once adopted by all agents of a congestion game, results in game-dynamics that converge to an $\epsilon$-approximate Nash Equilibrium in a polynomial number of rounds with respect to $1/\epsilon$, the number of players and the number of available resources. The proposed algorithm also guarantees sublinear regret to any agent adopting it. As a result, our work answers an open question from Cui et al (2022) and extends the recent results of Panageas et al (2023) to the bandit feedback model. Our algorithm can be implemented in *polynomial time* for the important special case of Network Congestion Games on Directed Acyclic Graphs (DAG) as barycentric spanners can efficiently be constructed in this case. We complete our work by further proposing a natural, exact, $1$-barycentric spanner construction for DAGs.

Cite

Text

Dadi et al. "Polynomial Convergence of Bandit No-Regret Dynamics in Congestion Games." ICML 2024 Workshops: Agentic_Markets, 2024.

Markdown

[Dadi et al. "Polynomial Convergence of Bandit No-Regret Dynamics in Congestion Games." ICML 2024 Workshops: Agentic_Markets, 2024.](https://mlanthology.org/icmlw/2024/dadi2024icmlw-polynomial/)

BibTeX

@inproceedings{dadi2024icmlw-polynomial,
  title     = {{Polynomial Convergence of Bandit No-Regret Dynamics in Congestion Games}},
  author    = {Dadi, Leello Tadesse and Panageas, Ioannis and Skoulakis, Stratis and Viano, Luca and Cevher, Volkan},
  booktitle = {ICML 2024 Workshops: Agentic_Markets},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/dadi2024icmlw-polynomial/}
}