Loss Decomposition for Fast Learning in Large Output Spaces

Abstract

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space. Recently, methods have been developed to speed up learning via efficient data structures for Nearest-Neighbor Search (NNS) or Maximum Inner-Product Search (MIPS). However, the performance of such data structures typically degrades in high dimensions. In this work, we propose a novel technique to reduce the intractable high dimensional search problem to several much more tractable lower dimensional ones via dual decomposition of the loss function. At the same time, we demonstrate guaranteed convergence to the original loss via a greedy message passing procedure. In our experiments on multiclass and multilabel classification with hundreds of thousands of classes, as well as training skip-gram word embeddings with a vocabulary size of half a million, our technique consistently improves the accuracy of search-based gradient approximation methods and outperforms sampling-based gradient approximation methods by a large margin.

Cite

Text

Yen et al. "Loss Decomposition for Fast Learning in Large Output Spaces." International Conference on Machine Learning, 2018.

Markdown

[Yen et al. "Loss Decomposition for Fast Learning in Large Output Spaces." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/yen2018icml-loss/)

BibTeX

@inproceedings{yen2018icml-loss,
  title     = {{Loss Decomposition for Fast Learning in Large Output Spaces}},
  author    = {Yen, Ian En-Hsu and Kale, Satyen and Yu, Felix and Holtmann-Rice, Daniel and Kumar, Sanjiv and Ravikumar, Pradeep},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {5640-5649},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/yen2018icml-loss/}
}