Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions

Abstract

Self-attention models have made great strides toward accurately modeling a wide array of data modalities, including, more recently, graph-structured data. This paper demonstrates that adaptive hierarchical attention can go a long way toward successfully applying transformers to graphs. Our proposed model Sequoia provides a powerful inductive bias towards long-range interaction modeling, leading to better generalization. We propose an end-to-end mechanism for a data-dependent construction of a hierarchy which in turn guides the self-attention mechanism. Using adaptive hierarchy provides a natural pathway toward sparse attention by constraining node-to-node interactions with the immediate family of each node in the hierarchy (e.g., parent, children, and siblings). This in turn dramatically reduces the computational complexity of a self-attention layer from quadratic to log-linear in terms of the input size while maintaining or sometimes even surpassing the standard transformer's ability to model long-range dependencies across the entire input. Experimentally, we report state-of-the-art performance on long-range graph benchmarks while remaining computationally efficient. Moving beyond graphs, we also display competitive performance on long-range sequence modeling, point-clouds classification, and segmentation when using a fixed hierarchy. Our source code is publicly available at https://github.com/HySonLab/HierAttention

Cite

Text

Trang et al. "Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions." Transactions on Machine Learning Research, 2024.

Markdown

[Trang et al. "Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/trang2024tmlr-scalable/)

BibTeX

@article{trang2024tmlr-scalable,
  title     = {{Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions}},
  author    = {Trang, Thuan Nguyen Anh and Ngo, Khang Nhat and Sonnery, Hugo and Vo, Thieu and Ravanbakhsh, Siamak and Hy, Truong Son},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/trang2024tmlr-scalable/}
}