Pure Exploration for Max-Quantile Bandits
Abstract
We consider a variant of the pure exploration problem in Multi-Armed Bandits, where the goal is to find the arm for which the $\lambda $ -quantile is maximal. Within the PAC framework, we provide a lower bound on the sample complexity of any $(\epsilon ,\delta )$ -correct algorithm, and propose algorithms with matching upper bounds. Our bounds sharpen existing ones by explicitly incorporating the quantile factor $\lambda $ . We further provide experiments that compare the sample complexity of our algorithms with that of previous works.
Cite
Text
David and Shimkin. "Pure Exploration for Max-Quantile Bandits." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016. doi:10.1007/978-3-319-46128-1_35Markdown
[David and Shimkin. "Pure Exploration for Max-Quantile Bandits." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.](https://mlanthology.org/ecmlpkdd/2016/david2016ecmlpkdd-pure/) doi:10.1007/978-3-319-46128-1_35BibTeX
@inproceedings{david2016ecmlpkdd-pure,
title = {{Pure Exploration for Max-Quantile Bandits}},
author = {David, Yahel and Shimkin, Nahum},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2016},
pages = {556-571},
doi = {10.1007/978-3-319-46128-1_35},
url = {https://mlanthology.org/ecmlpkdd/2016/david2016ecmlpkdd-pure/}
}