Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards

David, Yahel; Shimkin, Nahum

doi:10.1007/978-3-319-23528-8_29

Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards

Yahel David, Nahum Shimkin

ECML-PKDD 2015 pp. 464-479

doi:10.1007/978-3-319-23528-8_29 /ecmlpkdd/2015/david2015ecmlpkdd-refined/

Abstract

We consider a variant of the Multi-Armed Bandit problem which involves a large pool of a priori identical arms (or items). Each arm is associated with a deterministic value, which is sampled from a probability distribution with unknown maximal value, and is revealed once that arm is chosen. At each time instant the agent may choose a new arm (with unknown value), or a previously-chosen arm whose value is already revealed. The goal is to minimize the cumulative regret relative to the best arm in the pool. Previous work has established a lower bound on the regret for this model, depending on the functional form of the tail of the sample distribution, as well as algorithms that attain this bound up to logarithmic terms. Here, we present a more refined algorithm that attains the same order as the lower bound. We further consider several variants of the basic model, involving an anytime algorithm and the case of non-retainable arms. Numerical experiments demonstrate the superior performance of the suggested algorithms.

PDF ECML-PKDD Semantic Scholar

Cite

Text

David and Shimkin. "Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2015. doi:10.1007/978-3-319-23528-8_29

Markdown

[David and Shimkin. "Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2015.](https://mlanthology.org/ecmlpkdd/2015/david2015ecmlpkdd-refined/) doi:10.1007/978-3-319-23528-8_29

BibTeX

@inproceedings{david2015ecmlpkdd-refined,
  title     = {{Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards}},
  author    = {David, Yahel and Shimkin, Nahum},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2015},
  pages     = {464-479},
  doi       = {10.1007/978-3-319-23528-8_29},
  url       = {https://mlanthology.org/ecmlpkdd/2015/david2015ecmlpkdd-refined/}
}