Multi-Armed Bandit Problems with History

Abstract

In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts. We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data can reduce regret from logarithmic to constant. The effectiveness of the proposed algorithms are demonstrated on a large-scale malicious URL detection problem.

Cite

Text

Shivaswamy and Joachims. "Multi-Armed Bandit Problems with History." Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012.

Markdown

[Shivaswamy and Joachims. "Multi-Armed Bandit Problems with History." Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012.](https://mlanthology.org/aistats/2012/shivaswamy2012aistats-multiarmed/)

BibTeX

@inproceedings{shivaswamy2012aistats-multiarmed,
  title     = {{Multi-Armed Bandit Problems with History}},
  author    = {Shivaswamy, Pannagadatta and Joachims, Thorsten},
  booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  year      = {2012},
  pages     = {1046-1054},
  volume    = {22},
  url       = {https://mlanthology.org/aistats/2012/shivaswamy2012aistats-multiarmed/}
}