Improved Regret for Zeroth-Order Stochastic Convex Bandits
Abstract
We present an efficient algorithm for stochastic bandit convex optimisation with no assumptions on smoothness or strong convexity and for which the regret is bounded by O(d^(4.5) sqrt(n) polylog(n)), where n is the number of interactions and d is the dimension.
Cite
Text
Lattimore and Gyorgy. "Improved Regret for Zeroth-Order Stochastic Convex Bandits." Conference on Learning Theory, 2021.Markdown
[Lattimore and Gyorgy. "Improved Regret for Zeroth-Order Stochastic Convex Bandits." Conference on Learning Theory, 2021.](https://mlanthology.org/colt/2021/lattimore2021colt-improved/)BibTeX
@inproceedings{lattimore2021colt-improved,
title = {{Improved Regret for Zeroth-Order Stochastic Convex Bandits}},
author = {Lattimore, Tor and Gyorgy, Andras},
booktitle = {Conference on Learning Theory},
year = {2021},
pages = {2938-2964},
volume = {134},
url = {https://mlanthology.org/colt/2021/lattimore2021colt-improved/}
}