Quantile-Regret Minimisation in Infinitely Many-Armed Bandits
Abstract
The stochastic multi-armed bandit is a well-studied abstraction of decision making in the face of uncertainty. We consider the setting in which the number of bandit arms is much larger than the possible number of pulls, and can even be infinite. With the aim of minimising regret with respect to an optimal arm, existing methods for this setting either assume some structure over the set of arms (Kleinberg et al., 2008, Ray Chowdhury and Gopalan, 2017), or some property of the reward distribution (Wang et al., 2008). Invariably, the validity of such assumptions—and therefore the performance of the corresponding methods depends on instance-specific parameters, which might not be known beforehand. We propose a conceptually simple, parameter-free, and practically effective alternative. Specifically, we introduce a notion of regret with respect to the top quantile of a probability distribution over the expected reward of randomly drawn arms. Our main contribution is an algorithm that achieves sublinear “quantile-regret”, both (1) when it is specified a quantile, and (2) when the quantile can be any (unknown) positive value. The algorithm needs no side information about the arms or about the structure of their reward distributions: it relies on random sampling to reach arms in the top quantile. Experiments show that our algorithm outperforms several previous methods (in terms of conventional regret) when the latter are not tuned well, and often even when they are.
Cite
Text
Chaudhuri and Kalyanakrishnan. "Quantile-Regret Minimisation in Infinitely Many-Armed Bandits." Conference on Uncertainty in Artificial Intelligence, 2018.Markdown
[Chaudhuri and Kalyanakrishnan. "Quantile-Regret Minimisation in Infinitely Many-Armed Bandits." Conference on Uncertainty in Artificial Intelligence, 2018.](https://mlanthology.org/uai/2018/chaudhuri2018uai-quantile/)BibTeX
@inproceedings{chaudhuri2018uai-quantile,
title = {{Quantile-Regret Minimisation in Infinitely Many-Armed Bandits}},
author = {Chaudhuri, Arghya Roy and Kalyanakrishnan, Shivaram},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
year = {2018},
pages = {425-434},
url = {https://mlanthology.org/uai/2018/chaudhuri2018uai-quantile/}
}