Efficient Algorithms for Generalized Linear Bandits with Heavy-Tailed Rewards

Abstract

This paper investigates the problem of generalized linear bandits with heavy-tailed rewards, whose $(1+\epsilon)$-th moment is bounded for some $\epsilon\in (0,1]$. Although there exist methods for generalized linear bandits, most of them focus on bounded or sub-Gaussian rewards and are not well-suited for many real-world scenarios, such as financial markets and web-advertising. To address this issue, we propose two novel algorithms based on truncation and mean of medians. These algorithms achieve an almost optimal regret bound of $\widetilde{O}(dT^{\frac{1}{1+\epsilon}})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Additionally, our mean-of-medians-based algorithm requires only $O(\log T)$ rewards and one estimator per epoch, making it more practical. Moreover, our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $\epsilon=1$. Numerical experimental results confirm the merits of our algorithms.

Cite

Text

Xue et al. "Efficient Algorithms for Generalized Linear Bandits with Heavy-Tailed Rewards." Neural Information Processing Systems, 2023.

Markdown

[Xue et al. "Efficient Algorithms for Generalized Linear Bandits with Heavy-Tailed Rewards." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/xue2023neurips-efficient/)

BibTeX

@inproceedings{xue2023neurips-efficient,
  title     = {{Efficient Algorithms for Generalized Linear Bandits with Heavy-Tailed Rewards}},
  author    = {Xue, Bo and Wang, Yimu and Wan, Yuanyu and Yi, Jinfeng and Zhang, Lijun},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/xue2023neurips-efficient/}
}