Efficient Learning in Polyhedral Games via Best-Response Oracles

Chakrabarti, Darshan; Farina, Gabriele; Kroer, Christian

doi:10.1609/AAAI.V38I9.28812

Efficient Learning in Polyhedral Games via Best-Response Oracles

Darshan Chakrabarti, Gabriele Farina, Christian Kroer

AAAI 2024 pp. 9564-9572

doi:10.1609/AAAI.V38I9.28812 /aaai/2024/chakrabarti2024aaai-efficient/

Abstract

We study online learning and equilibrium computation in games with polyhedral decision sets, a property shared by normal-form games (NFGs) and extensive-form games (EFGs), when the learning agent is restricted to utilizing a best-response oracle. We show how to achieve constant regret in zero-sum games and O(T^0.25) regret in general-sum games while using only O(log t) best-response queries at a given iteration t, thus improving over the best prior result, which required O(T) queries per iteration. Moreover, our framework yields the first last-iterate convergence guarantees for self-play with best-response oracles in zero-sum games. This convergence occurs at a linear rate, though with a condition-number dependence. We go on to show a O(T^(-0.5)) best-iterate convergence rate without such a dependence. Our results build on linear-rate convergence results for variants of the Frank-Wolfe (FW) algorithm for strongly convex and smooth minimization problems over polyhedral domains. These FW results depend on a condition number of the polytope, known as facial distance. In order to enable application to settings such as EFGs, we show two broad new results: 1) the facial distance for polytopes in standard form is at least γ/k where γ is the minimum value of a nonzero coordinate of a vertex of the polytope and k≤n is the number of tight inequality constraints in the optimal face, and 2) the facial distance for polytopes of the form Ax=b, Cx≤d, x≥0 where x∈R^n, C≥0 is a nonzero integral matrix, and d≥0, is at least 1/(c√n), where c is the infinity norm of C. This yields the first such results for several problems such as sequence-form polytopes, flow polytopes, and matching polytopes.

PDF AAAI Semantic Scholar

Cite

Text

Chakrabarti et al. "Efficient Learning in Polyhedral Games via Best-Response Oracles." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I9.28812

Markdown

[Chakrabarti et al. "Efficient Learning in Polyhedral Games via Best-Response Oracles." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/chakrabarti2024aaai-efficient/) doi:10.1609/AAAI.V38I9.28812

BibTeX

@inproceedings{chakrabarti2024aaai-efficient,
  title     = {{Efficient Learning in Polyhedral Games via Best-Response Oracles}},
  author    = {Chakrabarti, Darshan and Farina, Gabriele and Kroer, Christian},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {9564-9572},
  doi       = {10.1609/AAAI.V38I9.28812},
  url       = {https://mlanthology.org/aaai/2024/chakrabarti2024aaai-efficient/}
}