Posterior Tracking Algorithm for Classification Bandits

Abstract

The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold. To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.

Cite

Text

Tabata et al. "Posterior Tracking Algorithm for Classification Bandits." Artificial Intelligence and Statistics, 2023.

Markdown

[Tabata et al. "Posterior Tracking Algorithm for Classification Bandits." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/tabata2023aistats-posterior/)

BibTeX

@inproceedings{tabata2023aistats-posterior,
  title     = {{Posterior Tracking Algorithm for Classification Bandits}},
  author    = {Tabata, Koji and Komiyama, Junpei and Nakamura, Atsuyoshi and Komatsuzaki, Tamiki},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {10994-11022},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/tabata2023aistats-posterior/}
}