Analysis of Thompson Sampling for Stochastic Sleeping Bandits

Chatterjee, Aritra; Ghalme, Ganesh; Jain, Shweta; Vaish, Rohit; Narahari, Y.

Analysis of Thompson Sampling for Stochastic Sleeping Bandits

Aritra Chatterjee, Ganesh Ghalme, Shweta Jain, Rohit Vaish, Y. Narahari

UAI 2017

/uai/2017/chatterjee2017uai-analysis/

Abstract

We study a variant of the stochastic multi-armed bandit problem where the set of available arms varies arbitrarily with time (also known as the sleeping bandit problem). We focus on the Thompson Sampling algorithm and consider a regret notion defined with respect to the best available arm. Our main result is an O(log T) regret bound for Thompson Sampling, which generalizes a similar bound known for this algorithm from the classical bandit setting. Our bound also matches (up to constants) the best-known lower bound for the sleeping bandit problem. We show via simulations that Thompson Sampling outperforms the UCB-style AUER algorithm for the sleeping bandit problem.

PDF UAI Semantic Scholar

Cite

Text

Chatterjee et al. "Analysis of Thompson Sampling for Stochastic Sleeping Bandits." Conference on Uncertainty in Artificial Intelligence, 2017.

Markdown

[Chatterjee et al. "Analysis of Thompson Sampling for Stochastic Sleeping Bandits." Conference on Uncertainty in Artificial Intelligence, 2017.](https://mlanthology.org/uai/2017/chatterjee2017uai-analysis/)

BibTeX

@inproceedings{chatterjee2017uai-analysis,
  title     = {{Analysis of Thompson Sampling for Stochastic Sleeping Bandits}},
  author    = {Chatterjee, Aritra and Ghalme, Ganesh and Jain, Shweta and Vaish, Rohit and Narahari, Y.},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2017},
  url       = {https://mlanthology.org/uai/2017/chatterjee2017uai-analysis/}
}