Fair Algorithms for Multi-Agent Multi-Armed Bandits

Abstract

We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are $N$ agents and $K$ arms, and pulling an arm generates a (possibly different) stochastic reward for each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the "best arm"; indeed, each agent may perceive a different arm to be the best for her personally. Instead, we seek to learn a fair distribution over the arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms and show that they achieve sublinear regret, which is now measured in terms of the lost Nash social welfare. We also extend a classical lower bound, establishing the optimality of one of our algorithms.

Cite

Text

Hossain et al. "Fair Algorithms for Multi-Agent Multi-Armed Bandits." Neural Information Processing Systems, 2021.

Markdown

[Hossain et al. "Fair Algorithms for Multi-Agent Multi-Armed Bandits." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/hossain2021neurips-fair/)

BibTeX

@inproceedings{hossain2021neurips-fair,
  title     = {{Fair Algorithms for Multi-Agent Multi-Armed Bandits}},
  author    = {Hossain, Safwan and Micha, Evi and Shah, Nisarg},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/hossain2021neurips-fair/}
}