Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits
Abstract
Upper Confidence Bound (UCB) methods are one of the most effective methods in dealing with the exploration-exploitation trade-off in online decision-making problems. The confidence bounds utilized in UCB methods tend to be constructed based on concentration equalities which are usually dependent on a parameter of scale (e.g. a bound on the payoffs, a variance, or a subgaussian parameter) that must be known in advance. The necessity of knowing a scale parameter a priori and the fact that the confidence bounds only use the tail information can deteriorate the performance of the UCB methods. Here we propose a data-dependent UCB algorithm called MARS (Maximum Average Randomly Sampled) in a non-parametric setup for multi-armed bandits with symmetric rewards. The algorithm does not depend on any scaling, and the data-dependent upper confidence bound is constructed based on the maximum average of randomly sampled rewards inspired by the work of Hartigan in the 1960s and 70s. A regret bound for the multi-armed bandit problem is derived under the same assumptions as for the $\psi$-UCB method without incorporating any correction factors. The method is illustrated and compared with baseline algorithms in numerical experiments.
Cite
Text
Khorasani and Weyer. "Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits." Neural Information Processing Systems, 2023.Markdown
[Khorasani and Weyer. "Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/khorasani2023neurips-maximum/)BibTeX
@inproceedings{khorasani2023neurips-maximum,
title = {{Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits}},
author = {Khorasani, Masoud Moravej and Weyer, Erik},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/khorasani2023neurips-maximum/}
}