Problem-Dependent Regret Bounds for Online Learning with Feedback Graphs

Abstract

This paper addresses the stochastic multi-armed bandit problem with an undirected feedback graph. We devise a UCB-based algorithm, UCB-NE, to provide a problem-dependent regret bound that depends on a clique covering. Our algorithm obtains regret which provably scales linearly with the clique covering number. Additionally, we provide problem-dependent regret bounds for a Thompson Sampling-based algorithm, TS-N, where again the bounds are linear in the clique covering number. Finally, we present experimental results to see how UCB-NE, TS-N, and a few related algorithms perform practically.

Cite

Text

Hu et al. "Problem-Dependent Regret Bounds for Online Learning with Feedback Graphs." Uncertainty in Artificial Intelligence, 2019.

Markdown

[Hu et al. "Problem-Dependent Regret Bounds for Online Learning with Feedback Graphs." Uncertainty in Artificial Intelligence, 2019.](https://mlanthology.org/uai/2019/hu2019uai-problemdependent/)

BibTeX

@inproceedings{hu2019uai-problemdependent,
  title     = {{Problem-Dependent Regret Bounds for Online Learning with Feedback Graphs}},
  author    = {Hu, Bingshan and Mehta, Nishant A. and Pan, Jianping},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2019},
  pages     = {852-861},
  volume    = {115},
  url       = {https://mlanthology.org/uai/2019/hu2019uai-problemdependent/}
}