Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Abstract

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches. It is a natural generalization from the graph Laplacian and spread-out regularizers, and empirically it addresses the drawback of each regularizer alone when applied to the extreme classification setup. With the proposed techniques, we attain or improve upon the state-of-the-art on most widely tested public extreme classification datasets with hundreds of thousands of labels.

Cite

Text

Guo et al. "Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces." Neural Information Processing Systems, 2019.

Markdown

[Guo et al. "Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/guo2019neurips-breaking/)

BibTeX

@inproceedings{guo2019neurips-breaking,
  title     = {{Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces}},
  author    = {Guo, Chuan and Mousavi, Ali and Wu, Xiang and Holtmann-Rice, Daniel N and Kale, Satyen and Reddi, Sashank and Kumar, Sanjiv},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {4943-4953},
  url       = {https://mlanthology.org/neurips/2019/guo2019neurips-breaking/}
}