Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms
Abstract
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitzbandits, we derive asymptotic problem specific lower bounds for the regret satisfied by anyalgorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitzstructure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptoticregret matches the lower bound. The regret analysis of our algorithms relies on a new concentrationinequality for weighted sums of KL divergences between the empirical distributions of rewards andtheir true distributions. For continuous Lipschitz bandits, we propose to first discretize the actionspace, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently.This approach is shown, through numerical experiments, to significantly outperform existing algorithmsthat directly deal with the continuous set of arms. Finally the results and algorithms areextended to contextual bandits with similarities.
Cite
Text
Magureanu et al. "Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms." Annual Conference on Computational Learning Theory, 2014.Markdown
[Magureanu et al. "Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms." Annual Conference on Computational Learning Theory, 2014.](https://mlanthology.org/colt/2014/magureanu2014colt-lipschitz/)BibTeX
@inproceedings{magureanu2014colt-lipschitz,
title = {{Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms}},
author = {Magureanu, Stefan and Combes, Richard and Proutière, Alexandre},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2014},
pages = {975-999},
url = {https://mlanthology.org/colt/2014/magureanu2014colt-lipschitz/}
}