A Scalable Hierarchical Distributed Language Model
Abstract
Neural probabilistic language models (NPLMs) have been shown to be competitive with and occasionally superior to the widely-used n-gram language models. The main drawback of NPLMs is their extremely long training and testing times. Morin and Bengio have proposed a hierarchical language model built around a binary tree of words that was two orders of magnitude faster than the non-hierarchical language model it was based on. However, it performed considerably worse than its non-hierarchical counterpart in spite of using a word tree created using expert knowledge. We introduce a fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data. We then show that the resulting models can outperform non-hierarchical models and achieve state-of-the-art performance.
Cite
Text
Mnih and Hinton. "A Scalable Hierarchical Distributed Language Model." Neural Information Processing Systems, 2008.Markdown
[Mnih and Hinton. "A Scalable Hierarchical Distributed Language Model." Neural Information Processing Systems, 2008.](https://mlanthology.org/neurips/2008/mnih2008neurips-scalable/)BibTeX
@inproceedings{mnih2008neurips-scalable,
title = {{A Scalable Hierarchical Distributed Language Model}},
author = {Mnih, Andriy and Hinton, Geoffrey E.},
booktitle = {Neural Information Processing Systems},
year = {2008},
pages = {1081-1088},
url = {https://mlanthology.org/neurips/2008/mnih2008neurips-scalable/}
}