Problem-Dependent Regret Bounds for Online Learning with Feedback Graphs
Abstract
This paper addresses the stochastic multi-armed bandit problem with an undirected feedback graph. We devise a UCB-based algorithm, UCB-NE, to provide a problem-dependent regret bound that depends on a clique covering. Our algorithm obtains regret which provably scales linearly with the clique covering number. Additionally, we provide problem-dependent regret bounds for a Thompson Sampling-based algorithm, TS-N, where again the bounds are linear in the clique covering number. Finally, we present experimental results to see how UCB-NE, TS-N, and a few related algorithms perform practically.
Cite
Text
Hu et al. "Problem-Dependent Regret Bounds for Online Learning with Feedback Graphs." Uncertainty in Artificial Intelligence, 2019.Markdown
[Hu et al. "Problem-Dependent Regret Bounds for Online Learning with Feedback Graphs." Uncertainty in Artificial Intelligence, 2019.](https://mlanthology.org/uai/2019/hu2019uai-problemdependent/)BibTeX
@inproceedings{hu2019uai-problemdependent,
title = {{Problem-Dependent Regret Bounds for Online Learning with Feedback Graphs}},
author = {Hu, Bingshan and Mehta, Nishant A. and Pan, Jianping},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2019},
pages = {852-861},
volume = {115},
url = {https://mlanthology.org/uai/2019/hu2019uai-problemdependent/}
}