Bandit Guided Submodular Curriculum for Adaptive Subset Selection
Abstract
Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce OnlineSubmod, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, OnlineSubmod outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validation-driven reward metrics offer a principled way to guide the curriculum schedule. Our code is publicly available at GitHub : https://github.com/efficiency-learning/banditsubmod/.
Cite
Text
Chanda et al. "Bandit Guided Submodular Curriculum for Adaptive Subset Selection." Advances in Neural Information Processing Systems, 2025.Markdown
[Chanda et al. "Bandit Guided Submodular Curriculum for Adaptive Subset Selection." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/chanda2025neurips-bandit/)BibTeX
@inproceedings{chanda2025neurips-bandit,
title = {{Bandit Guided Submodular Curriculum for Adaptive Subset Selection}},
author = {Chanda, Prateek and Agrawal, Prayas and Sureka, Saral and Polu, Lokesh Reddy and Kshirsagar, Atharv and Ramakrishnan, Ganesh},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/chanda2025neurips-bandit/}
}