On Multilabel Classification and Ranking with Bandit Feedback

Abstract

We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd- order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show $O(T^{1/2}\log T)$ regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on diverse real- world multilabel data sets, often obtaining comparable performance.

Cite

Text

Gentile and Orabona. "On Multilabel Classification and Ranking with Bandit Feedback." Journal of Machine Learning Research, 2014.

Markdown

[Gentile and Orabona. "On Multilabel Classification and Ranking with Bandit Feedback." Journal of Machine Learning Research, 2014.](https://mlanthology.org/jmlr/2014/gentile2014jmlr-multilabel/)

BibTeX

@article{gentile2014jmlr-multilabel,
  title     = {{On Multilabel Classification and Ranking with Bandit Feedback}},
  author    = {Gentile, Claudio and Orabona, Francesco},
  journal   = {Journal of Machine Learning Research},
  year      = {2014},
  pages     = {2451-2487},
  volume    = {15},
  url       = {https://mlanthology.org/jmlr/2014/gentile2014jmlr-multilabel/}
}