Generalized Objectives in Adaptive Experiments: The Frontier Between Regret and Speed

Abstract

This paper formulates a generalized model of multi-armed bandit experiments that accommodates both cumulative regret minimization and best-arm identification objectives. We identify the optimal instance-dependent scaling of the cumulative cost across experimentation and deployment, which is expressed in the familiar form uncovered by Lai and Robbins (1985). We show that the nature of asymptotically efficient algorithms is nearly independent of the cost functions, emphasizing a remarkable universality phenomenon. Balancing various cost considerations is reduced to an appropriate choice of exploitation rate. Additionally, we explore the Pareto frontier between the length of experiment and the cumulative regret across experimentation and deployment. A notable and universal feature is that even a slight reduction in the exploitation rate (from one to a slightly lower value results) in a substantial decrease in the experiment's length, accompanied by only a minimal increase in the cumulative regret.

Cite

Text

Qin and Russo. "Generalized Objectives in Adaptive Experiments: The Frontier Between Regret and Speed." NeurIPS 2023 Workshops: ReALML, 2023.

Markdown

[Qin and Russo. "Generalized Objectives in Adaptive Experiments: The Frontier Between Regret and Speed." NeurIPS 2023 Workshops: ReALML, 2023.](https://mlanthology.org/neuripsw/2023/qin2023neuripsw-generalized/)

BibTeX

@inproceedings{qin2023neuripsw-generalized,
  title     = {{Generalized Objectives in Adaptive Experiments: The Frontier Between Regret and Speed}},
  author    = {Qin, Chao and Russo, Daniel},
  booktitle = {NeurIPS 2023 Workshops: ReALML},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/qin2023neuripsw-generalized/}
}