Middle-Out Decoding

Abstract

Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled. In this paper, we speculate that a fundamental shortcoming of sequence generation models is that the decoding is done strictly from left-to-right, meaning that outputs values generated earlier have a profound effect on those generated later. To address this issue, we propose a novel middle-out decoder architecture that begins from an initial middle-word and simultaneously expands the sequence in both directions. To facilitate information flow and maintain consistent decoding, we introduce a dual self-attention mechanism that allows us to model complex dependencies between the outputs. We illustrate the performance of our model on the task of video captioning, as well as a synthetic sequence de-noising task. Our middle-out decoder achieves significant improvements on de-noising and competitive performance in the task of video captioning, while quantifiably improving the caption diversity. Furthermore, we perform a qualitative analysis that demonstrates our ability to effectively control the generation process of our decoder.

Cite

Text

Mehri and Sigal. "Middle-Out Decoding." Neural Information Processing Systems, 2018.

Markdown

[Mehri and Sigal. "Middle-Out Decoding." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/mehri2018neurips-middleout/)

BibTeX

@inproceedings{mehri2018neurips-middleout,
  title     = {{Middle-Out Decoding}},
  author    = {Mehri, Shikib and Sigal, Leonid},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {5518-5529},
  url       = {https://mlanthology.org/neurips/2018/mehri2018neurips-middleout/}
}