Normalization Helps Training of Quantized LSTM

Abstract

The long-short-term memory (LSTM), though powerful, is memory and computa\x02tion expensive. To alleviate this problem, one approach is to compress its weights by quantization. However, existing quantization methods usually have inferior performance when used on LSTMs. In this paper, we first show theoretically that training a quantized LSTM is difficult because quantization makes the exploding gradient problem more severe, particularly when the LSTM weight matrices are large. We then show that the popularly used weight/layer/batch normalization schemes can help stabilize the gradient magnitude in training quantized LSTMs. Empirical results show that the normalized quantized LSTMs achieve significantly better results than their unnormalized counterparts. Their performance is also comparable with the full-precision LSTM, while being much smaller in size.

Cite

Text

Hou et al. "Normalization Helps Training of Quantized LSTM." Neural Information Processing Systems, 2019.

Markdown

[Hou et al. "Normalization Helps Training of Quantized LSTM." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/hou2019neurips-normalization/)

BibTeX

@inproceedings{hou2019neurips-normalization,
  title     = {{Normalization Helps Training of Quantized LSTM}},
  author    = {Hou, Lu and Zhu, Jinhua and Kwok, James and Gao, Fei and Qin, Tao and Liu, Tie-Yan},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {7346-7356},
  url       = {https://mlanthology.org/neurips/2019/hou2019neurips-normalization/}
}