Aligning Transformers with Weisfeiler-Leman
Abstract
Graph neural network architectures aligned with the $k$-dimensional Weisfeiler–Leman ($k$-WL) hierarchy offer theoretically well-understood expressive power. However, these architectures often fail to deliver state-of-the-art predictive performance on real-world graphs, limiting their practical utility. While recent works aligning graph transformer architectures with the $k$-WL hierarchy have shown promising empirical results, employing transformers for higher orders of $k$ remains challenging due to a prohibitive runtime and memory complexity of self-attention as well as impractical architectural assumptions, such as an infeasible number of attention heads. Here, we advance the alignment of transformers with the $k$-WL hierarchy, showing stronger expressivity results for each $k$, making them more feasible in practice. In addition, we develop a theoretical framework that allows the study of established positional encodings such as Laplacian PEs and SPE. We evaluate our transformers on the large-scale PCQM4Mv2 dataset, showing competitive predictive performance with the state-of-the-art and demonstrating strong downstream performance when fine-tuning them on small-scale molecular datasets.
Cite
Text
Müller and Morris. "Aligning Transformers with Weisfeiler-Leman." International Conference on Machine Learning, 2024.Markdown
[Müller and Morris. "Aligning Transformers with Weisfeiler-Leman." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/muller2024icml-aligning/)BibTeX
@inproceedings{muller2024icml-aligning,
title = {{Aligning Transformers with Weisfeiler-Leman}},
author = {Müller, Luis and Morris, Christopher},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {36654-36704},
volume = {235},
url = {https://mlanthology.org/icml/2024/muller2024icml-aligning/}
}