Adaptive Experimental Design for Policy Learning: Contextual Best Arm Identification

Abstract

This study investigates the contextual best arm identification (BAI) problem, aiming to design an adaptive experiment to identify the best treatment arm conditioned on contextual information (covariates). We consider a decision-maker who assigns treatment arms to experimental units during an experiment and recommends the estimated best treatment arm based on the contexts at the end of the experiment. The decision-maker uses a policy for recommendations, which is a function that provides the estimated best treatment arm given the contexts. In our evaluation, we focus on the worst-case \emph{expected regret}, a relative measure between the expected outcomes of an optimal policy and our proposed policy. We derive a lower bound for the expected simple regret and then propose a strategy called \emph{Adaptive Sampling-Policy Learning} (PLAS). We prove that this strategy is minimax rate-optimal in the sense that its leading factor in the regret upper bound matches the lower bound as the number of experimental units increases.

Cite

Text

Kato et al. "Adaptive Experimental Design for Policy Learning: Contextual Best Arm Identification." ICML 2024 Workshops: RLControlTheory, 2024.

Markdown

[Kato et al. "Adaptive Experimental Design for Policy Learning: Contextual Best Arm Identification." ICML 2024 Workshops: RLControlTheory, 2024.](https://mlanthology.org/icmlw/2024/kato2024icmlw-adaptive/)

BibTeX

@inproceedings{kato2024icmlw-adaptive,
  title     = {{Adaptive Experimental Design for Policy Learning: Contextual Best Arm Identification}},
  author    = {Kato, Masahiro and Okumura, Kyohei and Ishihara, Takuya and Kitagawa, Toru},
  booktitle = {ICML 2024 Workshops: RLControlTheory},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/kato2024icmlw-adaptive/}
}