Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms

Abstract

Hierarchical Bayesian bandit refers to the multi-task bandit problem in which bandit tasks are assumed to be drawn from the same distribution. In this work, we provide improved Bayes regret bounds for hierarchical Bayesian bandit algorithms in the multi-task linear bandit and semi-bandit settings. For the multi-task linear bandit, we first analyze the preexisting hierarchical Thompson sampling (HierTS) algorithm, and improve its gap-independent Bayes regret bound from $O(m\sqrt{n\log{n}\log{(mn)}})$ to $O(m\sqrt{n\log{n}})$ in the case of infinite action set, with $m$ being the number of tasks and $n$ the number of iterations per task. In the case of finite action set, we propose a novel hierarchical Bayesian bandit algorithm, named hierarchical BayesUCB (HierBayesUCB), that achieves the logarithmic but gap-dependent regret bound $O(m\log{(mn)}\log{n})$ under mild assumptions. All of the above regret bounds hold in many variants of hierarchical Bayesian linear bandit problem, including when the tasks are solved sequentially or concurrently. Furthermore, we extend the aforementioned HierTS and HierBayesUCB algorithms to the multi-task combinatorial semi-bandit setting. Concretely, our combinatorial HierTS algorithm attains comparable Bayes regret bound $O(m\sqrt{n}\log{n})$ with respect to the latest one. Moreover, our combinatorial HierBayesUCB yields a sharper Bayes regret bound $O(m\log{(mn)}\log{n})$. Experiments are conducted to validate the soundness of our theoretical results for multi-task bandit algorithms.

Cite

Text

Guan and Xiong. "Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms." Neural Information Processing Systems, 2024. doi:10.52202/079017-2323

Markdown

[Guan and Xiong. "Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/guan2024neurips-improved/) doi:10.52202/079017-2323

BibTeX

@inproceedings{guan2024neurips-improved,
  title     = {{Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms}},
  author    = {Guan, Jiechao and Xiong, Hui},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2323},
  url       = {https://mlanthology.org/neurips/2024/guan2024neurips-improved/}
}