Improved Regret for Zeroth-Order Stochastic Convex Bandits

Abstract

We present an efficient algorithm for stochastic bandit convex optimisation with no assumptions on smoothness or strong convexity and for which the regret is bounded by O(d^(4.5) sqrt(n) polylog(n)), where n is the number of interactions and d is the dimension.

Cite

Text

Lattimore and Gyorgy. "Improved Regret for Zeroth-Order Stochastic Convex Bandits." Conference on Learning Theory, 2021.

Markdown

[Lattimore and Gyorgy. "Improved Regret for Zeroth-Order Stochastic Convex Bandits." Conference on Learning Theory, 2021.](https://mlanthology.org/colt/2021/lattimore2021colt-improved/)

BibTeX

@inproceedings{lattimore2021colt-improved,
  title     = {{Improved Regret for Zeroth-Order Stochastic Convex Bandits}},
  author    = {Lattimore, Tor and Gyorgy, Andras},
  booktitle = {Conference on Learning Theory},
  year      = {2021},
  pages     = {2938-2964},
  volume    = {134},
  url       = {https://mlanthology.org/colt/2021/lattimore2021colt-improved/}
}