Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization

Abstract

We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal $O^*(\sqrt{T})$ regret. The setting is a natural generalization of the nonstochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by previous approaches are overcome by the use of a self-concordant potential function. Our approach presents a novel connection between online learning and interior point methods.

Cite

Text

Abernethy et al. "Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization." Annual Conference on Computational Learning Theory, 2008.

Markdown

[Abernethy et al. "Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization." Annual Conference on Computational Learning Theory, 2008.](https://mlanthology.org/colt/2008/abernethy2008colt-competing/)

BibTeX

@inproceedings{abernethy2008colt-competing,
  title     = {{Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization}},
  author    = {Abernethy, Jacob D. and Hazan, Elad and Rakhlin, Alexander},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2008},
  pages     = {263-274},
  url       = {https://mlanthology.org/colt/2008/abernethy2008colt-competing/}
}