Analysis of Thompson Sampling for Stochastic Sleeping Bandits

Abstract

We study a variant of the stochastic multi-armed bandit problem where the set of available arms varies arbitrarily with time (also known as the sleeping bandit problem). We focus on the Thompson Sampling algorithm and consider a regret notion defined with respect to the best available arm. Our main result is an O(log T) regret bound for Thompson Sampling, which generalizes a similar bound known for this algorithm from the classical bandit setting. Our bound also matches (up to constants) the best-known lower bound for the sleeping bandit problem. We show via simulations that Thompson Sampling outperforms the UCB-style AUER algorithm for the sleeping bandit problem.

Cite

Text

Chatterjee et al. "Analysis of Thompson Sampling for Stochastic Sleeping Bandits." Conference on Uncertainty in Artificial Intelligence, 2017.

Markdown

[Chatterjee et al. "Analysis of Thompson Sampling for Stochastic Sleeping Bandits." Conference on Uncertainty in Artificial Intelligence, 2017.](https://mlanthology.org/uai/2017/chatterjee2017uai-analysis/)

BibTeX

@inproceedings{chatterjee2017uai-analysis,
  title     = {{Analysis of Thompson Sampling for Stochastic Sleeping Bandits}},
  author    = {Chatterjee, Aritra and Ghalme, Ganesh and Jain, Shweta and Vaish, Rohit and Narahari, Y.},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2017},
  url       = {https://mlanthology.org/uai/2017/chatterjee2017uai-analysis/}
}