A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Abstract

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally-efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is $\tilde{O} (\sqrt{\alpha T})$, where $T$ is the time horizon and $\alpha$ is the independence number of the feedback graph. The bound against stochastic environments is $O\big((\ln T)^2 \max_{S\in \mathcal I(G)} \sum_{i \in S} \Delta_i^{-1}\big)$ where $\mathcal I(G)$ is the family of all independent sets in a suitably defined undirected version of the graph and $\Delta_i$ are the suboptimality gaps.The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Rouyer et al. "A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs." Neural Information Processing Systems, 2022.

Markdown

[Rouyer et al. "A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/rouyer2022neurips-nearoptimal/)

BibTeX

@inproceedings{rouyer2022neurips-nearoptimal,
  title     = {{A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs}},
  author    = {Rouyer, Chloé and van der Hoeven, Dirk and Cesa-Bianchi, Nicolò and Seldin, Yevgeny},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/rouyer2022neurips-nearoptimal/}
}