PAC Subset Selection in Stochastic Multi-Armed Bandits

Kalyanakrishnan, Shivaram; Tewari, Ambuj; Auer, Peter; Stone, Peter

PAC Subset Selection in Stochastic Multi-Armed Bandits

Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, Peter Stone

ICML 2012

/icml/2012/kalyanakrishnan2012icml-pac/

Abstract

We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This “sub-set selection ” problem finds application in a variety of areas. In the authors ’ previ-ous work (Kalyanakrishnan &amp; Stone, 2010), this problem is framed under a PAC setting (denoted “Explore-m”), and corresponding sampling algorithms are analyzed. Whereas the formal analysis therein is restricted to the worst case sample complexity of algo-rithms, in this paper, we design and ana-lyze an algorithm (“LUCB”) with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the well-known UCB algorithm for regret minimiza-tion. The expected sample complexity bound we show for LUCB is novel even for single-arm selection (Explore-1). We also give a lower bound on the worst case sample com-plexity of PAC algorithms for Explore-m. 1.

PDF Semantic Scholar

Cite

Text

Kalyanakrishnan et al. "PAC Subset Selection in Stochastic Multi-Armed Bandits." International Conference on Machine Learning, 2012.

Markdown

[Kalyanakrishnan et al. "PAC Subset Selection in Stochastic Multi-Armed Bandits." International Conference on Machine Learning, 2012.](https://mlanthology.org/icml/2012/kalyanakrishnan2012icml-pac/)

BibTeX

@inproceedings{kalyanakrishnan2012icml-pac,
  title     = {{PAC Subset Selection in Stochastic Multi-Armed Bandits}},
  author    = {Kalyanakrishnan, Shivaram and Tewari, Ambuj and Auer, Peter and Stone, Peter},
  booktitle = {International Conference on Machine Learning},
  year      = {2012},
  url       = {https://mlanthology.org/icml/2012/kalyanakrishnan2012icml-pac/}
}