Exploration for Free: How Does Reward Heterogeneity Improve Regret in Cooperative Multi-Agent Bandits?

Wang, Xuchuang; Yang, Lin; Chen, Yu-zhen Janice; Liu, Xutong; Hajiesmaili, Mohammad; Towsley, Don; Lui, John C.S.

Exploration for Free: How Does Reward Heterogeneity Improve Regret in Cooperative Multi-Agent Bandits?

Xuchuang Wang, Lin Yang, Yu-zhen Janice Chen, Xutong Liu, Mohammad Hajiesmaili, Don Towsley, John C.S. Lui

UAI 2023 pp. 2192-2202

/uai/2023/wang2023uai-exploration/

Abstract

This paper studies a cooperative multi-agent bandit scenario in which the rewards observed by agents are heterogeneous—one agent’s meat can be another agent’s poison. Specifically, the total reward observed by each agent is the sum of two values: an arm-specific reward, capturing the intrinsic value of the arm, and a privately-known agent-specific reward, which captures the personal preference/limitations of the agent. This heterogeneity in total reward leads to different local optimal arms for agents but creates an opportunity for \textit{free exploration} in a cooperative setting—an agent can freely explore its local optimal arm with no regret and share this free observation with some other agents who would suffer regrets if they pull this arm since the arm is not optimal for them. We first characterize a regret lower bound that captures free exploration, i.e., arms that can be freely explored have no contribution to the regret lower bound. Then, we present a cooperative bandit algorithm that takes advantage of free exploration and achieves a near-optimal regret upper bound which tightly matches the regret lower bound up to a constant factor. Lastly, we run numerical simulations to compare our algorithm with various baselines without free exploration.

PDF UAI OpenReview Semantic Scholar

Cite

Text

Wang et al. "Exploration for Free: How Does Reward Heterogeneity Improve Regret in Cooperative Multi-Agent Bandits?." Uncertainty in Artificial Intelligence, 2023.

Markdown

[Wang et al. "Exploration for Free: How Does Reward Heterogeneity Improve Regret in Cooperative Multi-Agent Bandits?." Uncertainty in Artificial Intelligence, 2023.](https://mlanthology.org/uai/2023/wang2023uai-exploration/)

BibTeX

@inproceedings{wang2023uai-exploration,
  title     = {{Exploration for Free: How Does Reward Heterogeneity Improve Regret in Cooperative Multi-Agent Bandits?}},
  author    = {Wang, Xuchuang and Yang, Lin and Chen, Yu-zhen Janice and Liu, Xutong and Hajiesmaili, Mohammad and Towsley, Don and Lui, John C.S.},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2023},
  pages     = {2192-2202},
  volume    = {216},
  url       = {https://mlanthology.org/uai/2023/wang2023uai-exploration/}
}