Sub-Sampling for Multi-Armed Bandits

Abstract

The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.

Cite

Text

Baransi et al. "Sub-Sampling for Multi-Armed Bandits." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014. doi:10.1007/978-3-662-44848-9_8

Markdown

[Baransi et al. "Sub-Sampling for Multi-Armed Bandits." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014.](https://mlanthology.org/ecmlpkdd/2014/baransi2014ecmlpkdd-subsampling/) doi:10.1007/978-3-662-44848-9_8

BibTeX

@inproceedings{baransi2014ecmlpkdd-subsampling,
  title     = {{Sub-Sampling for Multi-Armed Bandits}},
  author    = {Baransi, Akram and Maillard, Odalric-Ambrym and Mannor, Shie},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2014},
  pages     = {115-131},
  doi       = {10.1007/978-3-662-44848-9_8},
  url       = {https://mlanthology.org/ecmlpkdd/2014/baransi2014ecmlpkdd-subsampling/}
}