Multiplayer Information Asymmetric Contextual Bandits

Abstract

Single-player contextual bandits are a well-studied problem in reinforcement learning that has seen applications in various fields such as advertising, healthcare, and finance. In light of the recent work on information asymmetric bandits, we propose a novel multiplayer information asymmetric contextual bandit framework where there are multiple players each with their own set of actions. At every round, they observe the same context vectors and simultaneously take an action from their own set of actions, giving rise to a joint action. However, upon taking this action the players are subjected to information asymmetry in (1) actions and/or (2) rewards. We designed an algorithm mLinUCB by modifying the classical single-player algorithm LinUCB in \cite{chu2011contextual} to achieve the optimal regret $O(\sqrt{T})$ when only one kind of asymmetry is present. We then propose a novel algorithm ETC that is built on explore-then-commit principles to achieve the same optimal regret when both types of asymmetry are present.

Cite

Text

Chang and Lu. "Multiplayer Information Asymmetric Contextual Bandits." Transactions on Machine Learning Research, 2025.

Markdown

[Chang and Lu. "Multiplayer Information Asymmetric Contextual Bandits." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/chang2025tmlr-multiplayer/)

BibTeX

@article{chang2025tmlr-multiplayer,
  title     = {{Multiplayer Information Asymmetric Contextual Bandits}},
  author    = {Chang, William and Lu, Yuanhao},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/chang2025tmlr-multiplayer/}
}