Breaking the SoftMax Bottleneck: A High-Rank RNN Language Model

Abstract

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

Cite

Text

Yang et al. "Breaking the SoftMax Bottleneck: A High-Rank RNN Language Model." International Conference on Learning Representations, 2018.

Markdown

[Yang et al. "Breaking the SoftMax Bottleneck: A High-Rank RNN Language Model." International Conference on Learning Representations, 2018.](https://mlanthology.org/iclr/2018/yang2018iclr-breaking/)

BibTeX

@inproceedings{yang2018iclr-breaking,
  title     = {{Breaking the SoftMax Bottleneck: A High-Rank RNN Language Model}},
  author    = {Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.},
  booktitle = {International Conference on Learning Representations},
  year      = {2018},
  url       = {https://mlanthology.org/iclr/2018/yang2018iclr-breaking/}
}