Stochastic Convex Optimization with Bandit Feedback
Abstract
This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $X$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in X$. We demonstrate a generalization of the ellipsoid algorithm that incurs $O(\poly(d)\sqrt{T})$ regret. Since any algorithm has regret at least $\Omega(\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.
Cite
Text
Agarwal et al. "Stochastic Convex Optimization with Bandit Feedback." Neural Information Processing Systems, 2011.Markdown
[Agarwal et al. "Stochastic Convex Optimization with Bandit Feedback." Neural Information Processing Systems, 2011.](https://mlanthology.org/neurips/2011/agarwal2011neurips-stochastic/)BibTeX
@inproceedings{agarwal2011neurips-stochastic,
title = {{Stochastic Convex Optimization with Bandit Feedback}},
author = {Agarwal, Alekh and Foster, Dean P. and Hsu, Daniel J. and Kakade, Sham M. and Rakhlin, Alexander},
booktitle = {Neural Information Processing Systems},
year = {2011},
pages = {1035-1043},
url = {https://mlanthology.org/neurips/2011/agarwal2011neurips-stochastic/}
}