Alternation Makes the Adversary Weaker in Two-Player Games
Abstract
Motivated by alternating game-play in two-player games, we study an altenating variant of the \textit{Online Linear Optimization} (OLO). In alternating OLO, a \textit{learner} at each round $t \in [n]$ selects a vector $x^t$ and then an \textit{adversary} selects a cost-vector $c^t \in [-1,1]^n$. The learner then experiences cost $(c^t + c^{t-1})^\top x^t$ instead of $(c^t)^\top x^t$ as in standard OLO. We establish that under this small twist, the $\Omega(\sqrt{T})$ lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit $\mathcal{O}((\log n)^{4/3} T^{1/3})$ regret for the $n$-dimensional simplex and $\mathcal{O}(\rho \log T)$ regret for the ball of radius $\rho>0$. Our results imply that in alternating game-play, an agent can always guarantee $\mathcal{\tilde{O}}((\log n)^{4/3} T^{1/3})$ regardless the strategies of the other agent while the regret bound improves to $\mathcal{O}(\log T)$ in case the agent admits only two actions.
Cite
Text
Cevher et al. "Alternation Makes the Adversary Weaker in Two-Player Games." Neural Information Processing Systems, 2023.Markdown
[Cevher et al. "Alternation Makes the Adversary Weaker in Two-Player Games." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/cevher2023neurips-alternation/)BibTeX
@inproceedings{cevher2023neurips-alternation,
title = {{Alternation Makes the Adversary Weaker in Two-Player Games}},
author = {Cevher, Volkan and Cutkosky, Ashok and Kavis, Ali and Piliouras, Georgios and Skoulakis, Stratis and Viano, Luca},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/cevher2023neurips-alternation/}
}