Linear Transformer Topological Masking with Graph Random Features

Abstract

When training transformers on graph-structured data, incorporating information about the underlying topology is crucial for good performance. Topological masking, a type of relative position encoding, achieves this by upweighting or downweighting attention depending on the relationship between the query and keys in the graph. In this paper, we propose to parameterise topological masks as a learnable function of a weighted adjacency matrix -- a novel, flexible approach which incorporates a strong structural inductive bias. By approximating this mask with graph random features (for which we prove the first known concentration bounds), we show how this can be made fully compatible with linear attention, preserving $\mathcal{O}(N)$ time and space complexity with respect to the number of input tokens. The fastest previous alternative was $\mathcal{O}(N \log N)$ and only suitable for specific graphs. Our efficient masking algorithms provide strong performance gains for image and point cloud data, including with $>30$k nodes.

Cite

Text

Reid et al. "Linear Transformer Topological Masking with Graph Random Features." International Conference on Learning Representations, 2025.

Markdown

[Reid et al. "Linear Transformer Topological Masking with Graph Random Features." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/reid2025iclr-linear/)

BibTeX

@inproceedings{reid2025iclr-linear,
  title     = {{Linear Transformer Topological Masking with Graph Random Features}},
  author    = {Reid, Isaac and Dubey, Kumar Avinava and Jain, Deepali and Whitney, William F and Ahmed, Amr and Ainslie, Joshua and Bewley, Alex and Jacob, Mithun George and Mehta, Aranyak and Rendleman, David and Schenck, Connor and Turner, Richard E. and Wagner, René and Weller, Adrian and Choromanski, Krzysztof Marcin},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/reid2025iclr-linear/}
}