Multi-Agent Heterogeneous Stochastic Linear Bandits

Ghosh, Avishek; Sankararaman, Abishek; Ramchandran, Kannan

doi:10.1007/978-3-031-26412-2_19

Multi-Agent Heterogeneous Stochastic Linear Bandits

Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

ECML-PKDD 2022 pp. 300-316

doi:10.1007/978-3-031-26412-2_19 /ecmlpkdd/2022/ghosh2022ecmlpkdd-multiagent/

Abstract

It has been empirically observed in several recommendation systems, that their performance improve as more people join the system by learning across heterogeneous users . In this paper, we seek to theoretically understand this phenomenon by studying the problem of minimizing regret in an N users heterogeneous stochastic linear bandits framework. We study this problem under two models of heterogeneity; (i) a personalization framework where no two users are necessarily identical, but are all similar, and (ii) a clustering framework where users are partitioned into groups with users in the same group being identical, but different across groups. In the personalization framework, we introduce a natural algorithm where, the personal bandit instances are initialized with the estimates of the global average model and show that, any agent i whose parameter deviates from the population average by $\epsilon _i$ ϵ i , attains a regret scaling of $\widetilde{O}(\epsilon _i\sqrt{T})$ O ~ ( ϵ i T ) . In the clustered users’ setup, we propose a successive refinement algorithm, which for any agent, achieves regret scaling as $\mathcal {O}(\sqrt{T/N})$ O ( T / N ) , if the agent is in a ‘well separated’ cluster, or scales as $\mathcal {O}(T^{\frac{1}{2} + \varepsilon }/(N)^{\frac{1}{2} -\varepsilon })$ O ( T 1 2 + ε / ( N ) 1 2 - ε ) if its cluster is not well separated, where $\varepsilon $ ε is positive and arbitrarily close to 0. Our algorithms enjoy several attractive features of being problem complexity adaptive and parameter free —if there is structure such as well separated clusters, or all users are similar to each other, then the regret of every agent goes down with N (collaborative gain). On the other hand, in the worst case, the regret of any user is no worse than that of having individual algorithms per user that does not leverage collaborations.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Ghosh et al. "Multi-Agent Heterogeneous Stochastic Linear Bandits." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26412-2_19

Markdown

[Ghosh et al. "Multi-Agent Heterogeneous Stochastic Linear Bandits." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/ghosh2022ecmlpkdd-multiagent/) doi:10.1007/978-3-031-26412-2_19

BibTeX

@inproceedings{ghosh2022ecmlpkdd-multiagent,
  title     = {{Multi-Agent Heterogeneous Stochastic Linear Bandits}},
  author    = {Ghosh, Avishek and Sankararaman, Abishek and Ramchandran, Kannan},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2022},
  pages     = {300-316},
  doi       = {10.1007/978-3-031-26412-2_19},
  url       = {https://mlanthology.org/ecmlpkdd/2022/ghosh2022ecmlpkdd-multiagent/}
}