Oracle-Efficient Combinatorial Semi-Bandits
Abstract
We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at *every* round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For worst-case linear rewards, our algorithms achieve $\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.
Cite
Text
Kim et al. "Oracle-Efficient Combinatorial Semi-Bandits." Advances in Neural Information Processing Systems, 2025.Markdown
[Kim et al. "Oracle-Efficient Combinatorial Semi-Bandits." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/kim2025neurips-oracleefficient/)BibTeX
@inproceedings{kim2025neurips-oracleefficient,
title = {{Oracle-Efficient Combinatorial Semi-Bandits}},
author = {Kim, Jung-hun and Vojnovic, Milan and Oh, Min-hwan},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/kim2025neurips-oracleefficient/}
}