Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits

Khorasani, Masoud Moravej; Weyer, Erik

Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits

NeurIPS 2023

/neurips/2023/khorasani2023neurips-maximum/

Abstract

Upper Confidence Bound (UCB) methods are one of the most effective methods in dealing with the exploration-exploitation trade-off in online decision-making problems. The confidence bounds utilized in UCB methods tend to be constructed based on concentration equalities which are usually dependent on a parameter of scale (e.g. a bound on the payoffs, a variance, or a subgaussian parameter) that must be known in advance. The necessity of knowing a scale parameter a priori and the fact that the confidence bounds only use the tail information can deteriorate the performance of the UCB methods. Here we propose a data-dependent UCB algorithm called MARS (Maximum Average Randomly Sampled) in a non-parametric setup for multi-armed bandits with symmetric rewards. The algorithm does not depend on any scaling, and the data-dependent upper confidence bound is constructed based on the maximum average of randomly sampled rewards inspired by the work of Hartigan in the 1960s and 70s. A regret bound for the multi-armed bandit problem is derived under the same assumptions as for the $\psi$-UCB method without incorporating any correction factors. The method is illustrated and compared with baseline algorithms in numerical experiments.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Khorasani and Weyer. "Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits." Neural Information Processing Systems, 2023.

Markdown

[Khorasani and Weyer. "Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/khorasani2023neurips-maximum/)

BibTeX

@inproceedings{khorasani2023neurips-maximum,
  title     = {{Maximum Average Randomly Sampled: A Scale Free and Non-Parametric Algorithm for Stochastic Bandits}},
  author    = {Khorasani, Masoud Moravej and Weyer, Erik},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/khorasani2023neurips-maximum/}
}