Good Arm Identification via Bandit Feedback

Abstract

We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem in which a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the exploration-exploitation dilemma of confidence, which is different from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor O(log1δ)\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\mathrm {O}(\log \frac{1}{\delta })$\end{document} for acceptance error rate δ\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\delta $\end{document}. We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.

Cite

Text

Kano et al. "Good Arm Identification via Bandit Feedback." Machine Learning, 2019. doi:10.1007/S10994-019-05784-4

Markdown

[Kano et al. "Good Arm Identification via Bandit Feedback." Machine Learning, 2019.](https://mlanthology.org/mlj/2019/kano2019mlj-good/) doi:10.1007/S10994-019-05784-4

BibTeX

@article{kano2019mlj-good,
  title     = {{Good Arm Identification via Bandit Feedback}},
  author    = {Kano, Hideaki and Honda, Junya and Sakamaki, Kentaro and Matsuura, Kentaro and Nakamura, Atsuyoshi and Sugiyama, Masashi},
  journal   = {Machine Learning},
  year      = {2019},
  pages     = {721-745},
  doi       = {10.1007/S10994-019-05784-4},
  volume    = {108},
  url       = {https://mlanthology.org/mlj/2019/kano2019mlj-good/}
}