Algorithms for Infinitely Many-Armed Bandits
Abstract
We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matchs (up to logarithmic factors) the upper-bound in some cases.
Cite
Text
Wang et al. "Algorithms for Infinitely Many-Armed Bandits." Neural Information Processing Systems, 2008.Markdown
[Wang et al. "Algorithms for Infinitely Many-Armed Bandits." Neural Information Processing Systems, 2008.](https://mlanthology.org/neurips/2008/wang2008neurips-algorithms/)BibTeX
@inproceedings{wang2008neurips-algorithms,
title = {{Algorithms for Infinitely Many-Armed Bandits}},
author = {Wang, Yizao and Audibert, Jean-yves and Munos, Rémi},
booktitle = {Neural Information Processing Systems},
year = {2008},
pages = {1729-1736},
url = {https://mlanthology.org/neurips/2008/wang2008neurips-algorithms/}
}