Infinitely Many-Armed Bandits with Budget Constraints

Abstract

We study the infinitely many-armed bandit problem with budget constraints, where the number of arms can be infinite and much larger than the number of possible experiments. The player aims at maximizing his/her total expected reward under a budget constraint B for the cost of pulling arms. We introduce a weak stochastic assumption on the ratio of expected-reward to expected-cost of a newly pulled arm which characterizes its probability of being a near-optimal arm. We propose an algorithm named RCB-I to this new problem, in which the player first randomly picks K arms, whose order is sub-linear in terms of B, and then runs the algorithm for the finite-arm setting on the selected arms. Theoretical analysis shows that this simple algorithm enjoys a sub-linear regret in term of the budget B. We also provide a lower bound of any algorithm under Bernoulli setting. The regret bound of RCB-I matches the lower bound up to a logarithmic factor. We further extend this algorithm to the any-budget setting (i.e., the budget is unknown in advance) and conduct corresponding theoretical analysis.

Cite

Text

Li and Xia. "Infinitely Many-Armed Bandits with Budget Constraints." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10881

Markdown

[Li and Xia. "Infinitely Many-Armed Bandits with Budget Constraints." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/li2017aaai-infinitely/) doi:10.1609/AAAI.V31I1.10881

BibTeX

@inproceedings{li2017aaai-infinitely,
  title     = {{Infinitely Many-Armed Bandits with Budget Constraints}},
  author    = {Li, Haifang and Xia, Yingce},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {2182-2188},
  doi       = {10.1609/AAAI.V31I1.10881},
  url       = {https://mlanthology.org/aaai/2017/li2017aaai-infinitely/}
}