Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
Abstract
We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.
Cite
Text
Zhou et al. "Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit." Proceedings of The 36th International Conference on Algorithmic Learning Theory, 2025.Markdown
[Zhou et al. "Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit." Proceedings of The 36th International Conference on Algorithmic Learning Theory, 2025.](https://mlanthology.org/alt/2025/zhou2025alt-logarithmic/)BibTeX
@inproceedings{zhou2025alt-logarithmic,
title = {{Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit}},
author = {Zhou, Julien and Gaillard, Pierre and Rahier, Thibaud and Arbel, Julyan},
booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
year = {2025},
pages = {1361-1385},
volume = {272},
url = {https://mlanthology.org/alt/2025/zhou2025alt-logarithmic/}
}