Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
Abstract
The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing efficient online selection solutions, however, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound with probability guarantees. Finally, we evaluate the performance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.
Cite
Text
Xu et al. "Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/354Markdown
[Xu et al. "Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/xu2020ijcai-combinatorial/) doi:10.24963/IJCAI.2020/354BibTeX
@inproceedings{xu2020ijcai-combinatorial,
title = {{Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints}},
author = {Xu, Huanle and Liu, Yang and Lau, Wing Cheong and Li, Rui},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2020},
pages = {2554-2560},
doi = {10.24963/IJCAI.2020/354},
url = {https://mlanthology.org/ijcai/2020/xu2020ijcai-combinatorial/}
}