Distributed Negative Sampling for Word Embeddings

Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, Kostas Tsioutsiouliklis

AAAI 2017 pp. 2569-2575

doi:10.1609/AAAI.V31I1.10931 /aaai/2017/stergiou2017aaai-distributed/

Abstract

Word2Vec recently popularized dense vector word representations as fixed-length features for machine learning algorithms and is in widespread use today. In this paper we investigate one of its core components, Negative Sampling, and propose efficient distributed algorithms that allow us to scale to vocabulary sizes of more than 1 billion unique words and corpus sizes of more than 1 trillion words.

PDF AAAI Semantic Scholar

Cite

Text

Stergiou et al. "Distributed Negative Sampling for Word Embeddings." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10931

Markdown

[Stergiou et al. "Distributed Negative Sampling for Word Embeddings." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/stergiou2017aaai-distributed/) doi:10.1609/AAAI.V31I1.10931

BibTeX

@inproceedings{stergiou2017aaai-distributed,
  title     = {{Distributed Negative Sampling for Word Embeddings}},
  author    = {Stergiou, Stergios and Straznickas, Zygimantas and Wu, Rolina and Tsioutsiouliklis, Kostas},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {2569-2575},
  doi       = {10.1609/AAAI.V31I1.10931},
  url       = {https://mlanthology.org/aaai/2017/stergiou2017aaai-distributed/}
}