Alternation Makes the Adversary Weaker in Two-Player Games

Abstract

Motivated by alternating game-play in two-player games, we study an altenating variant of the \textit{Online Linear Optimization} (OLO). In alternating OLO, a \textit{learner} at each round $t \in [n]$ selects a vector $x^t$ and then an \textit{adversary} selects a cost-vector $c^t \in [-1,1]^n$. The learner then experiences cost $(c^t + c^{t-1})^\top x^t$ instead of $(c^t)^\top x^t$ as in standard OLO. We establish that under this small twist, the $\Omega(\sqrt{T})$ lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit $\mathcal{O}((\log n)^{4/3} T^{1/3})$ regret for the $n$-dimensional simplex and $\mathcal{O}(\rho \log T)$ regret for the ball of radius $\rho>0$. Our results imply that in alternating game-play, an agent can always guarantee $\mathcal{\tilde{O}}((\log n)^{4/3} T^{1/3})$ regardless the strategies of the other agent while the regret bound improves to $\mathcal{O}(\log T)$ in case the agent admits only two actions.

Cite

Text

Cevher et al. "Alternation Makes the Adversary Weaker in Two-Player Games." Neural Information Processing Systems, 2023.

Markdown

[Cevher et al. "Alternation Makes the Adversary Weaker in Two-Player Games." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/cevher2023neurips-alternation/)

BibTeX

@inproceedings{cevher2023neurips-alternation,
  title     = {{Alternation Makes the Adversary Weaker in Two-Player Games}},
  author    = {Cevher, Volkan and Cutkosky, Ashok and Kavis, Ali and Piliouras, Georgios and Skoulakis, Stratis and Viano, Luca},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/cevher2023neurips-alternation/}
}