Robust Bloom Filters for Large MultiLabel Classification Tasks

Moustapha M Cisse, Nicolas Usunier, Thierry Artières, Patrick Gallinari

NeurIPS 2013 pp. 1851-1859

/neurips/2013/cisse2013neurips-robust/

Abstract

This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive application of Bloom filters in MLC is not robust to individual binary classifiers' errors. We then present an approach that exploits a specific feature of real-world datasets when the number of labels is large: many labels (almost) never appear together. Our approch is provably robust, has sublinear training and inference complexity with respect to the number of labels, and compares favorably to state-of-the-art algorithms on two large scale multilabel datasets.

PDF NeurIPS Semantic Scholar

Cite

Text

Cisse et al. "Robust Bloom Filters for Large MultiLabel Classification Tasks." Neural Information Processing Systems, 2013.

Markdown

[Cisse et al. "Robust Bloom Filters for Large MultiLabel Classification Tasks." Neural Information Processing Systems, 2013.](https://mlanthology.org/neurips/2013/cisse2013neurips-robust/)

BibTeX

@inproceedings{cisse2013neurips-robust,
  title     = {{Robust Bloom Filters for Large MultiLabel Classification Tasks}},
  author    = {Cisse, Moustapha M and Usunier, Nicolas and Artières, Thierry and Gallinari, Patrick},
  booktitle = {Neural Information Processing Systems},
  year      = {2013},
  pages     = {1851-1859},
  url       = {https://mlanthology.org/neurips/2013/cisse2013neurips-robust/}
}