Multiclass Classification with Bandit Feedback Using Adaptive Regularization

Abstract

We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the predicted label is correct or not, rather than the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade-off exploration and exploitation, instead of random sampling as performed by most current algorithms. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model which is also chosen adversarially. We show a regret of $\mathcal{O}(\sqrt{T}\log T)$ , which improves over the current best bounds of $\mathcal{O}(T^{2/3})$ in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems and on four vowel recognition tasks, often obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.

Cite

Text

Crammer and Gentile. "Multiclass Classification with Bandit Feedback Using Adaptive Regularization." Machine Learning, 2013. doi:10.1007/S10994-012-5321-8

Markdown

[Crammer and Gentile. "Multiclass Classification with Bandit Feedback Using Adaptive Regularization." Machine Learning, 2013.](https://mlanthology.org/mlj/2013/crammer2013mlj-multiclass/) doi:10.1007/S10994-012-5321-8

BibTeX

@article{crammer2013mlj-multiclass,
  title     = {{Multiclass Classification with Bandit Feedback Using Adaptive Regularization}},
  author    = {Crammer, Koby and Gentile, Claudio},
  journal   = {Machine Learning},
  year      = {2013},
  pages     = {347-383},
  doi       = {10.1007/S10994-012-5321-8},
  volume    = {90},
  url       = {https://mlanthology.org/mlj/2013/crammer2013mlj-multiclass/}
}