Adaptivity to Smoothness in X-Armed Bandits

Abstract

We study the stochastic continuum-armed bandit problem from the angle of adaptivity to \emph{unknown regularity} of the reward function $f$. We prove that there exists no strategy for the cumulative regret that adapts optimally to the \emph{smoothness} of $f$. We show however that such minimax optimal adaptive strategies exist if the learner is given \emph{extra-information} about $f$. Finally, we complement our positive results with matching lower bounds.

Cite

Text

Locatelli and Carpentier. "Adaptivity to Smoothness in X-Armed Bandits." Annual Conference on Computational Learning Theory, 2018.

Markdown

[Locatelli and Carpentier. "Adaptivity to Smoothness in X-Armed Bandits." Annual Conference on Computational Learning Theory, 2018.](https://mlanthology.org/colt/2018/locatelli2018colt-adaptivity/)

BibTeX

@inproceedings{locatelli2018colt-adaptivity,
  title     = {{Adaptivity to Smoothness in X-Armed Bandits}},
  author    = {Locatelli, Andrea and Carpentier, Alexandra},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2018},
  pages     = {1463-1492},
  url       = {https://mlanthology.org/colt/2018/locatelli2018colt-adaptivity/}
}