A No-Regret Generalization of Hierarchical SoftMax to Extreme Multi-Label Classification

Abstract

Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@$k$ is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic---a reduction technique from multi-label to multi-class that is routinely used along with HSM---is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.

Cite

Text

Wydmuch et al. "A No-Regret Generalization of Hierarchical SoftMax to Extreme Multi-Label Classification." Neural Information Processing Systems, 2018.

Markdown

[Wydmuch et al. "A No-Regret Generalization of Hierarchical SoftMax to Extreme Multi-Label Classification." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/wydmuch2018neurips-noregret/)

BibTeX

@inproceedings{wydmuch2018neurips-noregret,
  title     = {{A No-Regret Generalization of Hierarchical SoftMax to Extreme Multi-Label Classification}},
  author    = {Wydmuch, Marek and Jasinska, Kalina and Kuznetsov, Mikhail and Busa-Fekete, Róbert and Dembczynski, Krzysztof},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {6355-6366},
  url       = {https://mlanthology.org/neurips/2018/wydmuch2018neurips-noregret/}
}