Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update

Abstract

We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with $\mathcal{O}(1)$ time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.

Cite

Text

Zhang et al. "Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-generalized/)

BibTeX

@inproceedings{zhang2025neurips-generalized,
  title     = {{Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update}},
  author    = {Zhang, Yu-Jie and Xu, Sheng-An and Zhao, Peng and Sugiyama, Masashi},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-generalized/}
}