Matroid Semi-Bandits in Sublinear Time

Abstract

We study the matroid semi-bandits problem, where at each round the learner plays a subset of $K$ arms from a feasible set, and the goal is to maximize the expected cumulative linear rewards. Existing algorithms have per-round time complexity at least $\Omega(K)$, which becomes expensive when $K$ is large. To address this computational issue, we propose FasterCUCB whose sampling rule takes time sublinear in $K$ for common classes of matroids: $\mathcal{O}(D\text{ polylog}(K)\text{ polylog}(T))$ for uniform matroids, partition matroids, and graphical matroids, and $\mathcal{O}(D\sqrt{K}\text{ polylog}(T))$ for transversal matroids. Here, $D$ is the maximum number of elements in any feasible subset of arms, and $T$ is the horizon. Our technique is based on dynamic maintenance of an approximate maximum-weight basis over inner-product weights. Although the introduction of an approximate maximum-weight basis presents a challenge in regret analysis, we can still guarantee an upper bound on regret as tight as CUCB in the sense that it matches the gap-dependent lower bound by Kveton et al. (2014a) asymptotically.

Cite

Text

Tzeng et al. "Matroid Semi-Bandits in Sublinear Time." International Conference on Machine Learning, 2024.

Markdown

[Tzeng et al. "Matroid Semi-Bandits in Sublinear Time." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/tzeng2024icml-matroid/)

BibTeX

@inproceedings{tzeng2024icml-matroid,
  title     = {{Matroid Semi-Bandits in Sublinear Time}},
  author    = {Tzeng, Ruo-Chun and Ohsaka, Naoto and Ariu, Kaito},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {48855-48877},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/tzeng2024icml-matroid/}
}