Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

Abstract

We consider maximizing an unknown monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ with cardinality constraint under stochastic bandit feedback. At each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $|S_t| \leq k$ and receives reward $f(S_t) + \eta_t$ where $\eta_t$ is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum $f(S_*)$ with $|S_*| = k$, obtained through robust greedy maximization of $f$. To date, the best regret bound in the literature scales as $k n^{1/3} T^{2/3}$. And by trivially treating every set as a unique arm one deduces that $\sqrt{ {n \choose k} T }$ is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like $\tilde{\Omega}(\min_{L \le k}(L^{1/3}n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$. For a slightly restricted algorithm class, we prove a stronger regret lower bound of $\tilde{\Omega}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$. Moreover, we propose an algorithm Sub-UCB that achieves regret $\tilde{\mathcal{O}}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$ capable of matching the lower bound on regret for the restricted class up to logarithmic factors.

Cite

Text

Tajdini et al. "Nearly Minimax Optimal Submodular Maximization with Bandit Feedback." Neural Information Processing Systems, 2024. doi:10.52202/079017-3051

Markdown

[Tajdini et al. "Nearly Minimax Optimal Submodular Maximization with Bandit Feedback." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/tajdini2024neurips-nearly/) doi:10.52202/079017-3051

BibTeX

@inproceedings{tajdini2024neurips-nearly,
  title     = {{Nearly Minimax Optimal Submodular Maximization with Bandit Feedback}},
  author    = {Tajdini, Artin and Jain, Lalit and Jamieson, Kevin},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3051},
  url       = {https://mlanthology.org/neurips/2024/tajdini2024neurips-nearly/}
}