A Stochastic Bandit Algorithm for Scratch Games
Abstract
Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called "scratch-games", where arm budgets are limited and reward are drawn without replacement. Using Serfling inequality, we propose an upper confidence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization).
Cite
Text
Féraud and Urvoy. "A Stochastic Bandit Algorithm for Scratch Games." Proceedings of the Fourth Asian Conference on Machine Learning, 2012.Markdown
[Féraud and Urvoy. "A Stochastic Bandit Algorithm for Scratch Games." Proceedings of the Fourth Asian Conference on Machine Learning, 2012.](https://mlanthology.org/acml/2012/feraud2012acml-stochastic/)BibTeX
@inproceedings{feraud2012acml-stochastic,
title = {{A Stochastic Bandit Algorithm for Scratch Games}},
author = {Féraud, Raphaël and Urvoy, Tanguy},
booktitle = {Proceedings of the Fourth Asian Conference on Machine Learning},
year = {2012},
pages = {129-143},
volume = {25},
url = {https://mlanthology.org/acml/2012/feraud2012acml-stochastic/}
}