Efficient Selection of Multiple Bandit Arms: Theory and Practice

Abstract

We consider the general, widely applicable problem of selecting from n real-valued random variables a subset of size m of those with the highest means, based on as few samples as possible. This problem, which we denote Explore-m, is a core aspect in several stochastic optimization algorithms, and applications of simulation and industrial engineering. The theoretical basis for our work is an extension of a previous formulation using multi-armed bandits that is devoted to identifying just the one best of n random variables(Explore-1). In addition to providing PAC bounds for the general case, we tailor our theoretically grounded approach to work efficiently in practice. Empirical comparisons of the resulting sampling algorithm against state-of-the-art subset selection strategies demonstrate significant gains in sample efficiency.

Cite

Text

Kalyanakrishnan and Stone. "Efficient Selection of Multiple Bandit Arms: Theory and Practice." International Conference on Machine Learning, 2010.

Markdown

[Kalyanakrishnan and Stone. "Efficient Selection of Multiple Bandit Arms: Theory and Practice." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/kalyanakrishnan2010icml-efficient/)

BibTeX

@inproceedings{kalyanakrishnan2010icml-efficient,
  title     = {{Efficient Selection of Multiple Bandit Arms: Theory and Practice}},
  author    = {Kalyanakrishnan, Shivaram and Stone, Peter},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {511-518},
  url       = {https://mlanthology.org/icml/2010/kalyanakrishnan2010icml-efficient/}
}