A PTAS for the Bayesian Thresholding Bandit Problem

Abstract

In this paper, we study the Bayesian thresholding bandit problem (BTBP), where the goal is to adaptively make a budget of $Q$ queries to $n$ stochastic arms and determine the label for each arm (whether its mean reward is closer to $0$ or $1$). We present a polynomial-time approximation scheme for the BTBP with runtime $O(f(\epsilon) + Q)$ that achieves expected labeling accuracy at least $(\opt(Q) - \epsilon)$, where $f(\cdot)$ is a function that only depends on $\epsilon$ and $\opt(Q)$ is the optimal expected accuracy achieved by any algorithm. For any fixed $\epsilon > 0$, our algorithm runs in time linear with $Q$. The main algorithmic ideas we use include discretization employed in the PTASs for many dynamic programming algorithms (such as Knapsack), as well as many problem specific techniques such as proving an upper bound on the number of query numbers for any arm made by an almost optimal policy, and establishing the smoothness property of the $\opt(\cdot)$ curve, etc.

Cite

Text

Peng et al. "A PTAS for the Bayesian Thresholding Bandit Problem." Artificial Intelligence and Statistics, 2020.

Markdown

[Peng et al. "A PTAS for the Bayesian Thresholding Bandit Problem." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/peng2020aistats-ptas/)

BibTeX

@inproceedings{peng2020aistats-ptas,
  title     = {{A PTAS for the Bayesian Thresholding Bandit Problem}},
  author    = {Peng, Jian and Qin, Yue and Wei, Yadi and Zhou, Yuan},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {2455-2464},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/peng2020aistats-ptas/}
}