Extreme Multi-Label Classification from Aggregated Labels

Abstract

Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes. We develop a new and scalable algorithm to impute individual-sample labels from the group labels; this can be paired with any existing XMC method to solve the aggregated label problem. We characterize the statistical properties of our algorithm under mild assumptions, and provide a new end-to-end framework for MIML as an extension. Experiments on both aggregated label XMC and MIML tasks show the advantages over existing approaches.

Cite

Text

Shen et al. "Extreme Multi-Label Classification from Aggregated Labels." International Conference on Machine Learning, 2020.

Markdown

[Shen et al. "Extreme Multi-Label Classification from Aggregated Labels." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/shen2020icml-extreme/)

BibTeX

@inproceedings{shen2020icml-extreme,
  title     = {{Extreme Multi-Label Classification from Aggregated Labels}},
  author    = {Shen, Yanyao and Yu, Hsiang-Fu and Sanghavi, Sujay and Dhillon, Inderjit},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {8752-8762},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/shen2020icml-extreme/}
}