Efficient Training on Very Large Corpora via Gramian Estimation
Abstract
We study the problem of learning similarity functions over very large corpora using neural network embedding models. These models are typically trained using SGD with random sampling of unobserved pairs, with a sample size that grows quadratically with the corpus size, making it expensive to scale. We propose new efficient methods to train these models without having to sample unobserved pairs. Inspired by matrix factorization, our approach relies on adding a global quadratic penalty and expressing this term as the inner-product of two generalized Gramians. We show that the gradient of this term can be efficiently computed by maintaining estimates of the Gramians, and develop variance reduction schemes to improve the quality of the estimates. We conduct large-scale experiments that show a significant improvement both in training time and generalization performance compared to sampling methods.
Cite
Text
Krichene et al. "Efficient Training on Very Large Corpora via Gramian Estimation." International Conference on Learning Representations, 2019.Markdown
[Krichene et al. "Efficient Training on Very Large Corpora via Gramian Estimation." International Conference on Learning Representations, 2019.](https://mlanthology.org/iclr/2019/krichene2019iclr-efficient/)BibTeX
@inproceedings{krichene2019iclr-efficient,
title = {{Efficient Training on Very Large Corpora via Gramian Estimation}},
author = {Krichene, Walid and Mayoraz, Nicolas and Rendle, Steffen and Zhang, Li and Yi, Xinyang and Hong, Lichan and Chi, Ed and Anderson, John},
booktitle = {International Conference on Learning Representations},
year = {2019},
url = {https://mlanthology.org/iclr/2019/krichene2019iclr-efficient/}
}