Robust Bloom Filters for Large MultiLabel Classification Tasks

Abstract

This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive application of Bloom filters in MLC is not robust to individual binary classifiers' errors. We then present an approach that exploits a specific feature of real-world datasets when the number of labels is large: many labels (almost) never appear together. Our approch is provably robust, has sublinear training and inference complexity with respect to the number of labels, and compares favorably to state-of-the-art algorithms on two large scale multilabel datasets.

Cite

Text

Cisse et al. "Robust Bloom Filters for Large MultiLabel Classification Tasks." Neural Information Processing Systems, 2013.

Markdown

[Cisse et al. "Robust Bloom Filters for Large MultiLabel Classification Tasks." Neural Information Processing Systems, 2013.](https://mlanthology.org/neurips/2013/cisse2013neurips-robust/)

BibTeX

@inproceedings{cisse2013neurips-robust,
  title     = {{Robust Bloom Filters for Large MultiLabel Classification Tasks}},
  author    = {Cisse, Moustapha M and Usunier, Nicolas and Artières, Thierry and Gallinari, Patrick},
  booktitle = {Neural Information Processing Systems},
  year      = {2013},
  pages     = {1851-1859},
  url       = {https://mlanthology.org/neurips/2013/cisse2013neurips-robust/}
}