Bandit Social Learning Under Myopic Behavior
Abstract
We study social learning dynamics motivated by reviews on online platforms. Theagents collectively follow a simple multi-armed bandit protocol, but each agentacts myopically, without regards to exploration. We allow a wide range of myopicbehaviors that are consistent with (parameterized) confidence intervals for the arms’expected rewards. We derive stark exploration failures for any such behavior, andprovide matching positive results. As a special case, we obtain the first generalresults on failure of the greedy algorithm in bandits, thus providing a theoreticalfoundation for why bandit algorithms should explore.
Cite
Text
Banihashem et al. "Bandit Social Learning Under Myopic Behavior." Neural Information Processing Systems, 2023.Markdown
[Banihashem et al. "Bandit Social Learning Under Myopic Behavior." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/banihashem2023neurips-bandit/)BibTeX
@inproceedings{banihashem2023neurips-bandit,
title = {{Bandit Social Learning Under Myopic Behavior}},
author = {Banihashem, Kiarash and Hajiaghayi, MohammadTaghi and Shin, Suho and Slivkins, Aleksandrs},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/banihashem2023neurips-bandit/}
}