A Scalable Hierarchical Distributed Language Model

Abstract

Neural probabilistic language models (NPLMs) have been shown to be competitive with and occasionally superior to the widely-used n-gram language models. The main drawback of NPLMs is their extremely long training and testing times. Morin and Bengio have proposed a hierarchical language model built around a binary tree of words that was two orders of magnitude faster than the non-hierarchical language model it was based on. However, it performed considerably worse than its non-hierarchical counterpart in spite of using a word tree created using expert knowledge. We introduce a fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data. We then show that the resulting models can outperform non-hierarchical models and achieve state-of-the-art performance.

Cite

Text

Mnih and Hinton. "A Scalable Hierarchical Distributed Language Model." Neural Information Processing Systems, 2008.

Markdown

[Mnih and Hinton. "A Scalable Hierarchical Distributed Language Model." Neural Information Processing Systems, 2008.](https://mlanthology.org/neurips/2008/mnih2008neurips-scalable/)

BibTeX

@inproceedings{mnih2008neurips-scalable,
  title     = {{A Scalable Hierarchical Distributed Language Model}},
  author    = {Mnih, Andriy and Hinton, Geoffrey E.},
  booktitle = {Neural Information Processing Systems},
  year      = {2008},
  pages     = {1081-1088},
  url       = {https://mlanthology.org/neurips/2008/mnih2008neurips-scalable/}
}