Algorithms for Adversarial Bandit Problems with Multiple Plays

Abstract

Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k ( ≥ 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms ( Exp3 , ComBand [5] in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3 , called Exp3.M , performs best with respect to all the measures: it runs in O ( K (log k  + 1)) time and O ( K ) space, and suffers at most $O(\sqrt{kTK\log(K/k)})$ regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3.M is an extension of that proved by Auer et al. for Exp3 .

Cite

Text

Uchiya et al. "Algorithms for Adversarial Bandit Problems with Multiple Plays." International Conference on Algorithmic Learning Theory, 2010. doi:10.1007/978-3-642-16108-7_30

Markdown

[Uchiya et al. "Algorithms for Adversarial Bandit Problems with Multiple Plays." International Conference on Algorithmic Learning Theory, 2010.](https://mlanthology.org/alt/2010/uchiya2010alt-algorithms/) doi:10.1007/978-3-642-16108-7_30

BibTeX

@inproceedings{uchiya2010alt-algorithms,
  title     = {{Algorithms for Adversarial Bandit Problems with Multiple Plays}},
  author    = {Uchiya, Taishi and Nakamura, Atsuyoshi and Kudo, Mineichi},
  booktitle = {International Conference on Algorithmic Learning Theory},
  year      = {2010},
  pages     = {375-389},
  doi       = {10.1007/978-3-642-16108-7_30},
  url       = {https://mlanthology.org/alt/2010/uchiya2010alt-algorithms/}
}