Clus-UCB: A Near-Optimal Algorithm for Clustered Bandits

Abstract

We study a stochastic multi-armed bandit setting where arms are partitioned into known clusters, such that the parameters of arms within a cluster differ by at most a known threshold. While the clustering structure is known a priori, the arm parameters are unknown. We derive an asymptotic lower bound on the regret that improves upon the classical bound of Lai & Robbins (1985). We then propose Clus-UCB, an efficient algorithm that closely matches this lower bound asymptotically by exploiting the clustering structure and introducing a new index to evaluate an arm, which depends on other arms within the cluster. In this way, arms share information among each other. We present simulation results of our algorithm and compare its performance against KL-UCB and other well-known algorithms for bandits with dependent arms. We discuss the robustness of the proposed algorithm under misspecified prior information, address some limitations of this work, and conclude by outlining possible directions for future research.

Cite

Text

Gore and Chaporkar. "Clus-UCB: A Near-Optimal Algorithm for Clustered Bandits." Transactions on Machine Learning Research, 2026.

Markdown

[Gore and Chaporkar. "Clus-UCB: A Near-Optimal Algorithm for Clustered Bandits." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/gore2026tmlr-clusucb/)

BibTeX

@article{gore2026tmlr-clusucb,
  title     = {{Clus-UCB: A Near-Optimal Algorithm for Clustered Bandits}},
  author    = {Gore, Aakash and Chaporkar, Prasanna},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/gore2026tmlr-clusucb/}
}