Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback

Abstract

The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$.

Cite

Text

Saha and Tewari. "Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.

Markdown

[Saha and Tewari. "Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.](https://mlanthology.org/aistats/2011/saha2011aistats-improved/)

BibTeX

@inproceedings{saha2011aistats-improved,
  title     = {{Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback}},
  author    = {Saha, Ankan and Tewari, Ambuj},
  booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  year      = {2011},
  pages     = {636-642},
  volume    = {15},
  url       = {https://mlanthology.org/aistats/2011/saha2011aistats-improved/}
}