Multiclass Classification with Bandit Feedback Using Adaptive Regularization
Abstract
We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit of right-or-wrong, rather then the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model, which is also chosen adversarially. From the theoretical viewpoint, we show a regret of O(\sqrt{T}\log(T)), which improves over the current best bounds of O(T^2/3) in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems, obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.
Cite
Text
Crammer and Gentile. "Multiclass Classification with Bandit Feedback Using Adaptive Regularization." International Conference on Machine Learning, 2011. doi:10.1007/s10994-012-5321-8Markdown
[Crammer and Gentile. "Multiclass Classification with Bandit Feedback Using Adaptive Regularization." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/crammer2011icml-multiclass/) doi:10.1007/s10994-012-5321-8BibTeX
@inproceedings{crammer2011icml-multiclass,
title = {{Multiclass Classification with Bandit Feedback Using Adaptive Regularization}},
author = {Crammer, Koby and Gentile, Claudio},
booktitle = {International Conference on Machine Learning},
year = {2011},
pages = {273-280},
doi = {10.1007/s10994-012-5321-8},
url = {https://mlanthology.org/icml/2011/crammer2011icml-multiclass/}
}