Fast Structured Decoding for Sequence Models

Abstract

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to speed up the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms, our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.

Cite

Text

Sun et al. "Fast Structured Decoding for Sequence Models." Neural Information Processing Systems, 2019.

Markdown

[Sun et al. "Fast Structured Decoding for Sequence Models." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/sun2019neurips-fast/)

BibTeX

@inproceedings{sun2019neurips-fast,
  title     = {{Fast Structured Decoding for Sequence Models}},
  author    = {Sun, Zhiqing and Li, Zhuohan and Wang, Haoqing and He, Di and Lin, Zi and Deng, Zhihong},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {3016-3026},
  url       = {https://mlanthology.org/neurips/2019/sun2019neurips-fast/}
}