PAC Subset Selection in Stochastic Multi-Armed Bandits
Abstract
We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This “sub-set selection ” problem finds application in a variety of areas. In the authors ’ previ-ous work (Kalyanakrishnan & Stone, 2010), this problem is framed under a PAC setting (denoted “Explore-m”), and corresponding sampling algorithms are analyzed. Whereas the formal analysis therein is restricted to the worst case sample complexity of algo-rithms, in this paper, we design and ana-lyze an algorithm (“LUCB”) with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the well-known UCB algorithm for regret minimiza-tion. The expected sample complexity bound we show for LUCB is novel even for single-arm selection (Explore-1). We also give a lower bound on the worst case sample com-plexity of PAC algorithms for Explore-m. 1.
Cite
Text
Kalyanakrishnan et al. "PAC Subset Selection in Stochastic Multi-Armed Bandits." International Conference on Machine Learning, 2012.Markdown
[Kalyanakrishnan et al. "PAC Subset Selection in Stochastic Multi-Armed Bandits." International Conference on Machine Learning, 2012.](https://mlanthology.org/icml/2012/kalyanakrishnan2012icml-pac/)BibTeX
@inproceedings{kalyanakrishnan2012icml-pac,
title = {{PAC Subset Selection in Stochastic Multi-Armed Bandits}},
author = {Kalyanakrishnan, Shivaram and Tewari, Ambuj and Auer, Peter and Stone, Peter},
booktitle = {International Conference on Machine Learning},
year = {2012},
url = {https://mlanthology.org/icml/2012/kalyanakrishnan2012icml-pac/}
}