Cluster Agnostic Network Lasso Bandits
Abstract
We consider a multi-task contextual bandit setting, where the learner is given a graph encoding relations between the bandit tasks. The tasks' preference vectors are assumed to be piecewise constant over the graph, forming clusters. At every round, we estimate the preference vectors by solving an online network lasso problem with a suitably chosen, time-dependent regularization parameter. We establish a novel oracle inequality relying on a convenient restricted eigenvalue assumption. Our theoretical findings highlight the importance of dense intra-cluster connections and sparse inter-cluster ones. That results in a sublinear regret bound significantly lower than its counterpart in the independent task learning setting. Finally, we support our theoretical findings by experimental evaluation against graph bandit multi-task learning and online clustering of bandits algorithms.
Cite
Text
Dhouib et al. "Cluster Agnostic Network Lasso Bandits." Transactions on Machine Learning Research, 2025.Markdown
[Dhouib et al. "Cluster Agnostic Network Lasso Bandits." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/dhouib2025tmlr-cluster/)BibTeX
@article{dhouib2025tmlr-cluster,
title = {{Cluster Agnostic Network Lasso Bandits}},
author = {Dhouib, Sofien and Bilaj, Steven and Nourani-Koliji, Behzad and Maghsudi, Setareh},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/dhouib2025tmlr-cluster/}
}