PAC Identification of a Bandit Arm Relative to a Reward Quantile

Abstract

We propose a PAC formulation for identifying an arm in an n-armed bandit whose mean is within a fixed tolerance of the m-th highest mean. This setup generalises a previous formulation with m = 1, and differs from yet another one which requires m such arms to be identified. The key implication of our proposed approach is the ability to derive upper bounds on the sample complexity that depend on n/m in place of n. Consequently, even when the number of arms is infinite, we only need a finite number of samples to identify an arm that compares favourably with a fixed reward quantile. This facility makes our approach attractive to applications such as drug discovery, wherein the number of arms (molecular configurations) may run into a few thousands. We present sampling algorithms for both the finite- and infinite-armed cases, and validate their efficiency through theoretical and experimental analysis.We also present a lower bound on the worst case sample complexity of PAC algorithms for our problem, which matches our upper bound up to a logarithmic factor.

Cite

Text

Chaudhuri and Kalyanakrishnan. "PAC Identification of a Bandit Arm Relative to a Reward Quantile." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10802

Markdown

[Chaudhuri and Kalyanakrishnan. "PAC Identification of a Bandit Arm Relative to a Reward Quantile." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/chaudhuri2017aaai-pac/) doi:10.1609/AAAI.V31I1.10802

BibTeX

@inproceedings{chaudhuri2017aaai-pac,
  title     = {{PAC Identification of a Bandit Arm Relative to a Reward Quantile}},
  author    = {Chaudhuri, Arghya Roy and Kalyanakrishnan, Shivaram},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {1777-1783},
  doi       = {10.1609/AAAI.V31I1.10802},
  url       = {https://mlanthology.org/aaai/2017/chaudhuri2017aaai-pac/}
}