Bandit Social Learning Under Myopic Behavior

Abstract

We study social learning dynamics motivated by reviews on online platforms. Theagents collectively follow a simple multi-armed bandit protocol, but each agentacts myopically, without regards to exploration. We allow a wide range of myopicbehaviors that are consistent with (parameterized) confidence intervals for the arms’expected rewards. We derive stark exploration failures for any such behavior, andprovide matching positive results. As a special case, we obtain the first generalresults on failure of the greedy algorithm in bandits, thus providing a theoreticalfoundation for why bandit algorithms should explore.

Cite

Text

Banihashem et al. "Bandit Social Learning Under Myopic Behavior." Neural Information Processing Systems, 2023.

Markdown

[Banihashem et al. "Bandit Social Learning Under Myopic Behavior." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/banihashem2023neurips-bandit/)

BibTeX

@inproceedings{banihashem2023neurips-bandit,
  title     = {{Bandit Social Learning Under Myopic Behavior}},
  author    = {Banihashem, Kiarash and Hajiaghayi, MohammadTaghi and Shin, Suho and Slivkins, Aleksandrs},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/banihashem2023neurips-bandit/}
}