No Weighted-Regret Learning in Adversarial Bandits with Delays
Abstract
Consider a scenario where a player chooses an action in each round $t$ out of $T$ rounds and observes the incurred cost after a delay of $d_{t}$ rounds. The cost functions and the delay sequence are chosen by an adversary. We show that in a non-cooperative game, the expected weighted ergodic distribution of play converges to the set of coarse correlated equilibria if players use algorithms that have “no weighted-regret” in the above scenario, even if they have linear regret due to too large delays. For a two-player zero-sum game, we show that no weighted-regret is sufficient for the weighted ergodic average of play to converge to the set of Nash equilibria. We prove that the FKM algorithm with $n$ dimensions achieves an expected regret of $O\left(nT^{\frac{3}{4}}+\sqrt{n}T^{\frac{1}{3}}D^{\frac{1}{3}}\right)$ and the EXP3 algorithm with $K$ arms achieves an expected regret of $O\left(\sqrt{\log K\left(KT+D\right)}\right)$ even when $D=\sum_{t=1}^{T}d_{t}$ and $T$ are unknown. These bounds use a novel doubling trick that, under mild assumptions, provably retains the regret bound for when $D$ and $T$ are known. Using these bounds, we show that FKM and EXP3 have no weighted-regret even for $d_{t}=O\left(t\log t\right)$. Therefore, algorithms with no weighted-regret can be used to approximate a CCE of a finite or convex unknown game that can only be simulated with bandit feedback, even if the simulation involves significant delays.
Cite
Text
Bistritz et al. "No Weighted-Regret Learning in Adversarial Bandits with Delays." Journal of Machine Learning Research, 2022.Markdown
[Bistritz et al. "No Weighted-Regret Learning in Adversarial Bandits with Delays." Journal of Machine Learning Research, 2022.](https://mlanthology.org/jmlr/2022/bistritz2022jmlr-weightedregret/)BibTeX
@article{bistritz2022jmlr-weightedregret,
title = {{No Weighted-Regret Learning in Adversarial Bandits with Delays}},
author = {Bistritz, Ilai and Zhou, Zhengyuan and Chen, Xi and Bambos, Nicholas and Blanchet, Jose},
journal = {Journal of Machine Learning Research},
year = {2022},
pages = {1-43},
volume = {23},
url = {https://mlanthology.org/jmlr/2022/bistritz2022jmlr-weightedregret/}
}