Towards Principled Graph Transformers

Abstract

The expressive power of graph learning architectures based on the $k$-dimensional Weisfeiler-Leman ($k$-WL) hierarchy is well understood. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the $k$-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has 3-WL expressive power when provided with the right tokenization. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings.

Cite

Text

Müller et al. "Towards Principled Graph Transformers." Neural Information Processing Systems, 2024. doi:10.52202/079017-4026

Markdown

[Müller et al. "Towards Principled Graph Transformers." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/muller2024neurips-principled/) doi:10.52202/079017-4026

BibTeX

@inproceedings{muller2024neurips-principled,
  title     = {{Towards Principled Graph Transformers}},
  author    = {Müller, Luis and Kusuma, Daniel and Bonet, Blai and Morris, Christopher},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4026},
  url       = {https://mlanthology.org/neurips/2024/muller2024neurips-principled/}
}