Stochastic Rising Bandits
Abstract
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms’ expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.
Cite
Text
Metelli et al. "Stochastic Rising Bandits." International Conference on Machine Learning, 2022.Markdown
[Metelli et al. "Stochastic Rising Bandits." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/metelli2022icml-stochastic/)BibTeX
@inproceedings{metelli2022icml-stochastic,
title = {{Stochastic Rising Bandits}},
author = {Metelli, Alberto Maria and Trovò, Francesco and Pirola, Matteo and Restelli, Marcello},
booktitle = {International Conference on Machine Learning},
year = {2022},
pages = {15421-15457},
volume = {162},
url = {https://mlanthology.org/icml/2022/metelli2022icml-stochastic/}
}