Greedy Confidence Pursuit: A Pragmatic Approach to Multi-Bandit Optimization

Abstract

We address the practical problem of maximizing the number of high-confidence results produced among multiple experiments sharing an exhaustible pool of resources. We formalize this problem in the framework of bandit optimization as follows: given a set of multiple multi-armed bandits and a budget on the total number of trials allocated among them, select the top- m arms (with high confidence) for as many of the bandits as possible. To solve this problem, which we call greedy confidence pursuit , we develop a method based on posterior sampling . We show empirically that our method outperforms existing methods for top- m selection in single bandits, which has been studied previously, and improves on baseline methods for the full greedy confidence pursuit problem, which has not been studied previously.

Cite

Text

Bachman and Precup. "Greedy Confidence Pursuit: A Pragmatic Approach to Multi-Bandit Optimization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013. doi:10.1007/978-3-642-40988-2_16

Markdown

[Bachman and Precup. "Greedy Confidence Pursuit: A Pragmatic Approach to Multi-Bandit Optimization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013.](https://mlanthology.org/ecmlpkdd/2013/bachman2013ecmlpkdd-greedy/) doi:10.1007/978-3-642-40988-2_16

BibTeX

@inproceedings{bachman2013ecmlpkdd-greedy,
  title     = {{Greedy Confidence Pursuit: A Pragmatic Approach to Multi-Bandit Optimization}},
  author    = {Bachman, Philip and Precup, Doina},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2013},
  pages     = {241-256},
  doi       = {10.1007/978-3-642-40988-2_16},
  url       = {https://mlanthology.org/ecmlpkdd/2013/bachman2013ecmlpkdd-greedy/}
}