The Budgeted Multi-Armed Bandit Problem

Abstract

The following coins problem is a version of a multi-armed bandit problem where one has to select from among a set of objects, say classifiers, after an experimentation phase that is constrained by a time or cost budget. The question is how to spend the budget. The problem involves pure exploration only, differentiating it from typical multi-armed bandit problems involving an exploration/exploitation tradeoff [BF85]. It is an abstraction of the following scenarios: choosing from among a set of alternative treatments after a fixed number of clinical trials, determining the best parameter settings for a program given a deadline that only allows a fixed number of runs; or choosing a life partner in the bachelor/bachelorette TV show where time is limited. We are interested in the computational complexity of the coins problem and/or efficient algorithms with approximation guarantees.

Cite

Text

Madani et al. "The Budgeted Multi-Armed Bandit Problem." Annual Conference on Computational Learning Theory, 2004. doi:10.1007/978-3-540-27819-1_46

Markdown

[Madani et al. "The Budgeted Multi-Armed Bandit Problem." Annual Conference on Computational Learning Theory, 2004.](https://mlanthology.org/colt/2004/madani2004colt-budgeted/) doi:10.1007/978-3-540-27819-1_46

BibTeX

@inproceedings{madani2004colt-budgeted,
  title     = {{The Budgeted Multi-Armed Bandit Problem}},
  author    = {Madani, Omid and Lizotte, Daniel J. and Greiner, Russell},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2004},
  pages     = {643-645},
  doi       = {10.1007/978-3-540-27819-1_46},
  url       = {https://mlanthology.org/colt/2004/madani2004colt-budgeted/}
}