Greedy Algorithms for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure

Abstract

We study the greedy (exploitation-only) algorithm in bandit problems with a known reward structure. We allow arbitrary finite reward structures, while prior work focused on a few specific ones. We fully characterize when the greedy algorithm asymptotically succeeds or fails, in the sense of sublinear vs. linear regret as a function of time. Our characterization identifies a partial identifiability property of the problem instance as the necessary and sufficient condition for the asymptotic success. Notably, once this property holds, the problem becomes easy—\emph{any} algorithm will succeed (in the same sense as above), provided it satisfies a mild non-degeneracy condition. Our characterization extends to contextual bandits and interactive decision-making with arbitrary feedback. Examples demonstrating broad applicability and extensions to infinite reward structures are provided.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Slivkins et al. "Greedy Algorithms for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure." Advances in Neural Information Processing Systems, 2025.

Markdown

[Slivkins et al. "Greedy Algorithms for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/slivkins2025neurips-greedy/)

BibTeX

@inproceedings{slivkins2025neurips-greedy,
  title     = {{Greedy Algorithms for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure}},
  author    = {Slivkins, Aleksandrs and Xu, Yunzong and Zuo, Shiliang},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/slivkins2025neurips-greedy/}
}