Online Learning with Erdos-Renyi Side-Observation Graphs

Kocák, Tomás; Neu, Gergely; Valko, Michal

Online Learning with Erdos-Renyi Side-Observation Graphs

UAI 2016

/uai/2016/kocak2016uai-online/

Abstract

We consider adversarial multi-armed bandit problems where the learner is allowed to observe losses of a number of arms beside the arm that it actually chose. We study the case where all non-chosen arms reveal their loss with an unknown probability r_t, independently of each other and the action of the learner. Moreover, we allow r_t to change in every round t, which rules out the possibility of estimating r_t by a well-concentrated sample average. We propose an algorithm which operates under the assumption that r_t is large enough to warrant at least one side observation with high probability. We show that after T rounds in a bandit problem with N arms, the expected regret of our algorithm is of order O(\sqrt{\sum_t=1^T (1/r_t) \log N }), given that r_t\ge \log T / (2N-2) for all t. All our bounds are within logarithmic factors of the best achievable performance of any algorithm that is even allowed to know exact values of r_t.

PDF UAI Semantic Scholar

Cite

Text

Kocák et al. "Online Learning with Erdos-Renyi Side-Observation Graphs." Conference on Uncertainty in Artificial Intelligence, 2016.

Markdown

[Kocák et al. "Online Learning with Erdos-Renyi Side-Observation Graphs." Conference on Uncertainty in Artificial Intelligence, 2016.](https://mlanthology.org/uai/2016/kocak2016uai-online/)

BibTeX

@inproceedings{kocak2016uai-online,
  title     = {{Online Learning with Erdos-Renyi Side-Observation Graphs}},
  author    = {Kocák, Tomás and Neu, Gergely and Valko, Michal},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2016},
  url       = {https://mlanthology.org/uai/2016/kocak2016uai-online/}
}