Online Optimization in X-Armed Bandits

Abstract

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Hölder with a known exponent, then the expected regret is bounded up to a logarithmic factor by $n$, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.

Cite

Text

Bubeck et al. "Online Optimization in X-Armed Bandits." Neural Information Processing Systems, 2008.

Markdown

[Bubeck et al. "Online Optimization in X-Armed Bandits." Neural Information Processing Systems, 2008.](https://mlanthology.org/neurips/2008/bubeck2008neurips-online/)

BibTeX

@inproceedings{bubeck2008neurips-online,
  title     = {{Online Optimization in X-Armed Bandits}},
  author    = {Bubeck, Sébastien and Stoltz, Gilles and Szepesvári, Csaba and Munos, Rémi},
  booktitle = {Neural Information Processing Systems},
  year      = {2008},
  pages     = {201-208},
  url       = {https://mlanthology.org/neurips/2008/bubeck2008neurips-online/}
}