Decentralized Two-Sided Bandit Learning in Matching Market

Abstract

Two-sided matching under uncertainty has recently drawn much attention due to its wide applications. Existing works in matching bandits mainly focus on the one-sided learning setting and design algorithms with the objective of converging to stable matching with low regret. In this paper, we consider the more general two-sided learning setting, i.e. participants on both sides have to learn their preferences over the other side through repeated interactions. Inspired by the classical result that the optimal matching for the proposing side can be obtained using the Gale-Shapley algorithm, our inquiry stems from the curiosity about whether this result still holds in a two-sided learning setting. To handle this question, we formally introduce the two-sided learning setting, addressing strategies for both the arm and player sides without restrictive assumptions such as special preference structure and observation of winning players. Our results not only provide a positive answer to our inquiry but also offer a near-optimal upper bound, achieving $O(\log T)$ regret.

Cite

Text

Zhang and Fang. "Decentralized Two-Sided Bandit Learning in Matching Market." Uncertainty in Artificial Intelligence, 2024.

Markdown

[Zhang and Fang. "Decentralized Two-Sided Bandit Learning in Matching Market." Uncertainty in Artificial Intelligence, 2024.](https://mlanthology.org/uai/2024/zhang2024uai-decentralized/)

BibTeX

@inproceedings{zhang2024uai-decentralized,
  title     = {{Decentralized Two-Sided Bandit Learning in Matching Market}},
  author    = {Zhang, Yirui and Fang, Zhixuan},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2024},
  pages     = {4173-4191},
  volume    = {244},
  url       = {https://mlanthology.org/uai/2024/zhang2024uai-decentralized/}
}