Bandits with Side Observations: Bounded vs. Logarithmic Regret

Abstract

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free. We prove that, no matter how small $\epsilon$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/\epsilon)}{\Delta_i}$, up to multiplicative constant and loglog terms. We also prove a matching lower-bound, stating that no reasonable algorithm can outperform this quantity.

Cite

Text

Degenne et al. "Bandits with Side Observations: Bounded vs. Logarithmic Regret." Conference on Uncertainty in Artificial Intelligence, 2018.

Markdown

[Degenne et al. "Bandits with Side Observations: Bounded vs. Logarithmic Regret." Conference on Uncertainty in Artificial Intelligence, 2018.](https://mlanthology.org/uai/2018/degenne2018uai-bandits/)

BibTeX

@inproceedings{degenne2018uai-bandits,
  title     = {{Bandits with Side Observations: Bounded vs. Logarithmic Regret}},
  author    = {Degenne, Rémy and Garcelon, Evrard and Perchet, Vianney},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2018},
  pages     = {467-476},
  url       = {https://mlanthology.org/uai/2018/degenne2018uai-bandits/}
}