Online Learning with Erdos-Renyi Side-Observation Graphs
Abstract
We consider adversarial multi-armed bandit problems where the learner is allowed to observe losses of a number of arms beside the arm that it actually chose. We study the case where all non-chosen arms reveal their loss with an unknown probability r_t, independently of each other and the action of the learner. Moreover, we allow r_t to change in every round t, which rules out the possibility of estimating r_t by a well-concentrated sample average. We propose an algorithm which operates under the assumption that r_t is large enough to warrant at least one side observation with high probability. We show that after T rounds in a bandit problem with N arms, the expected regret of our algorithm is of order O(\sqrt{\sum_t=1^T (1/r_t) \log N }), given that r_t\ge \log T / (2N-2) for all t. All our bounds are within logarithmic factors of the best achievable performance of any algorithm that is even allowed to know exact values of r_t.
Cite
Text
Kocák et al. "Online Learning with Erdos-Renyi Side-Observation Graphs." Conference on Uncertainty in Artificial Intelligence, 2016.Markdown
[Kocák et al. "Online Learning with Erdos-Renyi Side-Observation Graphs." Conference on Uncertainty in Artificial Intelligence, 2016.](https://mlanthology.org/uai/2016/kocak2016uai-online/)BibTeX
@inproceedings{kocak2016uai-online,
title = {{Online Learning with Erdos-Renyi Side-Observation Graphs}},
author = {Kocák, Tomás and Neu, Gergely and Valko, Michal},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
year = {2016},
url = {https://mlanthology.org/uai/2016/kocak2016uai-online/}
}