An Explore-Then-Commit Algorithm for Submodular Maximization Under Full-Bandit Feedback
Abstract
We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.
Cite
Text
Nie et al. "An Explore-Then-Commit Algorithm for Submodular Maximization Under Full-Bandit Feedback." Uncertainty in Artificial Intelligence, 2022.Markdown
[Nie et al. "An Explore-Then-Commit Algorithm for Submodular Maximization Under Full-Bandit Feedback." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/nie2022uai-explorethencommit/)BibTeX
@inproceedings{nie2022uai-explorethencommit,
title = {{An Explore-Then-Commit Algorithm for Submodular Maximization Under Full-Bandit Feedback}},
author = {Nie, Guanyu and Agarwal, Mridul and Umrawal, Abhishek Kumar and Aggarwal, Vaneet and John Quinn, Christopher},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2022},
pages = {1541-1551},
volume = {180},
url = {https://mlanthology.org/uai/2022/nie2022uai-explorethencommit/}
}