The Real Price of Bandit Information in Multiclass Classification
Abstract
We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}(\min |\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |\mathcal{H}|})}$, where $\mathcal{H}$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.
Cite
Text
Erez et al. "The Real Price of Bandit Information in Multiclass Classification." Conference on Learning Theory, 2024.Markdown
[Erez et al. "The Real Price of Bandit Information in Multiclass Classification." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/erez2024colt-real/)BibTeX
@inproceedings{erez2024colt-real,
title = {{The Real Price of Bandit Information in Multiclass Classification}},
author = {Erez, Liad and Cohen, Alon and Koren, Tomer and Mansour, Yishay and Moran, Shay},
booktitle = {Conference on Learning Theory},
year = {2024},
pages = {1573-1598},
volume = {247},
url = {https://mlanthology.org/colt/2024/erez2024colt-real/}
}