Bayesian Analysis of Combinatorial Gaussian Process Bandits

Abstract

We consider the combinatorial volatile Gaussian process (GP) semi-bandit problem. Each round, an agent is provided a set of available base arms and must select a subset of them to maximize the long-term cumulative reward. We study the Bayesian setting and provide novel Bayesian cumulative regret bounds for three GP-based algorithms: GP-UCB, GP-BayesUCB and GP-TS. Our bounds extend previous results for GP-UCB and GP-TS to the \emph{infinite}, \emph{volatile} and \emph{combinatorial} setting, and to the best of our knowledge, we provide the first regret bound for GP-BayesUCB. Volatile arms encompass other widely considered bandit problems such as contextual bandits. Furthermore, we employ our framework to address the challenging real-world problem of online energy-efficient navigation, where we demonstrate its effectiveness compared to the alternatives.

Cite

Text

Sandberg et al. "Bayesian Analysis of Combinatorial Gaussian Process Bandits." International Conference on Learning Representations, 2025.

Markdown

[Sandberg et al. "Bayesian Analysis of Combinatorial Gaussian Process Bandits." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/sandberg2025iclr-bayesian/)

BibTeX

@inproceedings{sandberg2025iclr-bayesian,
  title     = {{Bayesian Analysis of Combinatorial Gaussian Process Bandits}},
  author    = {Sandberg, Jack and Åkerblom, Niklas and Chehreghani, Morteza Haghir},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/sandberg2025iclr-bayesian/}
}