A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback-Leibler Divergences
Abstract
We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).
Cite
Text
Maillard et al. "A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback-Leibler Divergences." Proceedings of the 24th Annual Conference on Learning Theory, 2011.Markdown
[Maillard et al. "A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback-Leibler Divergences." Proceedings of the 24th Annual Conference on Learning Theory, 2011.](https://mlanthology.org/colt/2011/maillard2011colt-finitetime/)BibTeX
@inproceedings{maillard2011colt-finitetime,
title = {{A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback-Leibler Divergences}},
author = {Maillard, Odalric-Ambrym and Munos, Rémi and Stoltz, Gilles},
booktitle = {Proceedings of the 24th Annual Conference on Learning Theory},
year = {2011},
pages = {497-514},
volume = {19},
url = {https://mlanthology.org/colt/2011/maillard2011colt-finitetime/}
}