Laplacian Kernelized Bandit

Abstract

We study multi-user contextual bandits where users are related by a graph and their reward functions exhibit both non-linear behavior and graph homophily. We introduce a principled joint penalty for the collection of user reward functions $\\{f_u\\}$, combining a graph smoothness term based on RKHS distances with an individual roughness penalty. Our central contribution is proving that this penalty is equivalent to the squared norm within a single, unified _multi-user RKHS_. We explicitly derive its reproducing kernel, which elegantly fuses the graph Laplacian with the base arm kernel. This unification allows us to reframe the problem as learning a single "lifted" function, enabling the design of principled algorithms, LK-GP-UCB and LK-GP-TS, that leverage Gaussian Process posteriors over this new kernel for exploration. We provide high-probability regret bounds that scale with an _effective dimension_ of the multi-user kernel, replacing dependencies on user count or ambient dimension. Empirically, our methods outperform strong linear and non-graph-aware baselines in non-linear settings and remain competitive even when the true rewards are linear. Our work delivers a unified, theoretically grounded, and practical framework that bridges Laplacian regularization with kernelized bandits for structured exploration.

Cite

Text

Wu and Amini. "Laplacian Kernelized Bandit." International Conference on Learning Representations, 2026.

Markdown

[Wu and Amini. "Laplacian Kernelized Bandit." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wu2026iclr-laplacian/)

BibTeX

@inproceedings{wu2026iclr-laplacian,
  title     = {{Laplacian Kernelized Bandit}},
  author    = {Wu, Shuang and Amini, Arash A.},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wu2026iclr-laplacian/}
}