Algorithms for Infinitely Many-Armed Bandits

Wang, Yizao; Audibert, Jean-yves; Munos, Rémi

Algorithms for Infinitely Many-Armed Bandits

Yizao Wang, Jean-yves Audibert, Rémi Munos

NeurIPS 2008 pp. 1729-1736

/neurips/2008/wang2008neurips-algorithms/

Abstract

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matchs (up to logarithmic factors) the upper-bound in some cases.

PDF NeurIPS Semantic Scholar

Cite

Text

Wang et al. "Algorithms for Infinitely Many-Armed Bandits." Neural Information Processing Systems, 2008.

Markdown

[Wang et al. "Algorithms for Infinitely Many-Armed Bandits." Neural Information Processing Systems, 2008.](https://mlanthology.org/neurips/2008/wang2008neurips-algorithms/)

BibTeX

@inproceedings{wang2008neurips-algorithms,
  title     = {{Algorithms for Infinitely Many-Armed Bandits}},
  author    = {Wang, Yizao and Audibert, Jean-yves and Munos, Rémi},
  booktitle = {Neural Information Processing Systems},
  year      = {2008},
  pages     = {1729-1736},
  url       = {https://mlanthology.org/neurips/2008/wang2008neurips-algorithms/}
}