Continuous Self-Attention Models with Neural ODE Networks
Abstract
Stacked self-attention models receive widespread attention, due to its ability of capturing global dependency among words. However, the stacking of many layers and components generates huge parameters, leading to low parameter efficiency. In response to this issue, we propose a lightweight architecture named Continuous Self-Attention models with neural ODE networks (CSAODE). In CSAODE, continuous dynamical models (i.e., neural ODEs) are coupled with our proposed self-attention block to form a self-attention ODE solver. This solver continuously calculates and optimizes the hidden states via only one layer of parameters to improve the parameter efficiency. In addition, we design a novel accelerated continuous dynamical model to reduce computing costs, and integrate it in CSAODE. Moreover, since the original self-attention ignores local information, CSAODE makes use of N-gram convolution to encode local representations, and a fusion layer with only two trainable scalars are designed for generating sentence vectors. We perform a series of experiments on text classification, neural language inference (NLI) and text matching tasks. With fewer parameters, CSAODE outperforms state-of-the-art models on text classification tasks (e.g., 1.3% accuracy improved on SUBJ task), and has competitive performances for NLI and text matching tasks as well.
Cite
Text
Zhang et al. "Continuous Self-Attention Models with Neural ODE Networks." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I16.17692Markdown
[Zhang et al. "Continuous Self-Attention Models with Neural ODE Networks." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/zhang2021aaai-continuous/) doi:10.1609/AAAI.V35I16.17692BibTeX
@inproceedings{zhang2021aaai-continuous,
title = {{Continuous Self-Attention Models with Neural ODE Networks}},
author = {Zhang, Jing and Zhang, Peng and Kong, Baiwen and Wei, Junqiu and Jiang, Xin},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2021},
pages = {14393-14401},
doi = {10.1609/AAAI.V35I16.17692},
url = {https://mlanthology.org/aaai/2021/zhang2021aaai-continuous/}
}