Residual Bootstrap Exploration for Stochastic Linear Bandit

Abstract

We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next step reward by re-sampling the residuals of mean reward estimate. Our algorithm, residual bootstrap exploration for stochastic linear bandit (\texttt{LinReBoot}), estimates the linear reward from its re-sampling distribution and pulls the arm with the highest reward estimate. In particular, we contribute a theoretical framework to demystify residual bootstrap-based exploration mechanisms in stochastic linear bandit problems. The key insight is that the strength of bootstrap exploration is based on collaborated optimism between the online-learned model and the re-sampling distribution of residuals. Such observation enables us to show that the proposed \texttt{LinReBoot} secure a high-probability $\tilde{O}(d \sqrt{n})$ sub-linear regret under mild conditions. Our experiments support the easy generalizability of the \texttt{ReBoot} principle in the various formulations of linear bandit problems and show the significant computational efficiency of \texttt{LinReBoot}.

Cite

Text

Wu et al. "Residual Bootstrap Exploration for Stochastic Linear Bandit." Uncertainty in Artificial Intelligence, 2022.

Markdown

[Wu et al. "Residual Bootstrap Exploration for Stochastic Linear Bandit." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/wu2022uai-residual/)

BibTeX

@inproceedings{wu2022uai-residual,
  title     = {{Residual Bootstrap Exploration for Stochastic Linear Bandit}},
  author    = {Wu, Shuang and Wang, Chi-Hua and Li, Yuantong and Cheng, Guang},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2022},
  pages     = {2117-2127},
  volume    = {180},
  url       = {https://mlanthology.org/uai/2022/wu2022uai-residual/}
}