Compositional De-Attention Networks

Abstract

Attentional models are distinctly characterized by their ability to learn relative importance, i.e., assigning a different weight to input values. This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to \textit{add}, \textit{subtract} or \textit{nullify} a certain vector when learning representations. This is strongly contrasted with vanilla attention, which simply re-weights input tokens. Our proposed \textit{Compositional De-Attention} (CoDA) is fundamentally built upon the intuition of both similarity and dissimilarity (negative affinity) when computing affinity scores, benefiting from a greater extent of expressiveness. We evaluate CoDA on six NLP tasks, i.e. open domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis and text2code generation. We obtain promising experimental results, achieving state-of-the-art performance on several tasks/datasets.

Cite

Text

Tay et al. "Compositional De-Attention Networks." Neural Information Processing Systems, 2019.

Markdown

[Tay et al. "Compositional De-Attention Networks." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/tay2019neurips-compositional/)

BibTeX

@inproceedings{tay2019neurips-compositional,
  title     = {{Compositional De-Attention Networks}},
  author    = {Tay, Yi and Luu, Anh Tuan and Zhang, Aston and Wang, Shuohang and Hui, Siu Cheung},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {6135-6145},
  url       = {https://mlanthology.org/neurips/2019/tay2019neurips-compositional/}
}