Pure Exploration and Regret Minimization in Matching Bandits

Abstract

Finding an optimal matching in a weighted graph is a standard combinatorial problem. We consider its semi-bandit version where either a pair or a full matching is sampled sequentially. We prove that it is possible to leverage a rank-1 assumption on the adjacency matrix to reduce the sample complexity and the regret of off-the-shelf algorithms up to reaching a linear dependency in the number of vertices (up to to poly-log terms).

Cite

Text

Sentenac et al. "Pure Exploration and Regret Minimization in Matching Bandits." International Conference on Machine Learning, 2021.

Markdown

[Sentenac et al. "Pure Exploration and Regret Minimization in Matching Bandits." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/sentenac2021icml-pure/)

BibTeX

@inproceedings{sentenac2021icml-pure,
  title     = {{Pure Exploration and Regret Minimization in Matching Bandits}},
  author    = {Sentenac, Flore and Yi, Jialin and Calauzenes, Clement and Perchet, Vianney and Vojnovic, Milan},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {9434-9442},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/sentenac2021icml-pure/}
}