Stochastic Convex Optimization with Bandit Feedback

Abstract

This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $X$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in X$. We demonstrate a generalization of the ellipsoid algorithm that incurs $O(\poly(d)\sqrt{T})$ regret. Since any algorithm has regret at least $\Omega(\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.

Cite

Text

Agarwal et al. "Stochastic Convex Optimization with Bandit Feedback." Neural Information Processing Systems, 2011.

Markdown

[Agarwal et al. "Stochastic Convex Optimization with Bandit Feedback." Neural Information Processing Systems, 2011.](https://mlanthology.org/neurips/2011/agarwal2011neurips-stochastic/)

BibTeX

@inproceedings{agarwal2011neurips-stochastic,
  title     = {{Stochastic Convex Optimization with Bandit Feedback}},
  author    = {Agarwal, Alekh and Foster, Dean P. and Hsu, Daniel J. and Kakade, Sham M. and Rakhlin, Alexander},
  booktitle = {Neural Information Processing Systems},
  year      = {2011},
  pages     = {1035-1043},
  url       = {https://mlanthology.org/neurips/2011/agarwal2011neurips-stochastic/}
}