Pure Exploration for Max-Quantile Bandits

Abstract

We consider a variant of the pure exploration problem in Multi-Armed Bandits, where the goal is to find the arm for which the $\lambda $ -quantile is maximal. Within the PAC framework, we provide a lower bound on the sample complexity of any $(\epsilon ,\delta )$ -correct algorithm, and propose algorithms with matching upper bounds. Our bounds sharpen existing ones by explicitly incorporating the quantile factor $\lambda $ . We further provide experiments that compare the sample complexity of our algorithms with that of previous works.

Cite

Text

David and Shimkin. "Pure Exploration for Max-Quantile Bandits." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016. doi:10.1007/978-3-319-46128-1_35

Markdown

[David and Shimkin. "Pure Exploration for Max-Quantile Bandits." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.](https://mlanthology.org/ecmlpkdd/2016/david2016ecmlpkdd-pure/) doi:10.1007/978-3-319-46128-1_35

BibTeX

@inproceedings{david2016ecmlpkdd-pure,
  title     = {{Pure Exploration for Max-Quantile Bandits}},
  author    = {David, Yahel and Shimkin, Nahum},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2016},
  pages     = {556-571},
  doi       = {10.1007/978-3-319-46128-1_35},
  url       = {https://mlanthology.org/ecmlpkdd/2016/david2016ecmlpkdd-pure/}
}