GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP
Abstract
Graph Transformers (GTs) like NAGphormer have shown impressive performance by encoding graph’s structural information and node features. However, their self-attention and complex architectures require high computation and memory, hindering their deployment. Thus, we propose a novel framework called Graph Transformer Distillation to Multi-Layer Perceptron (GraTeD-MLP). GraTeD-MLP leverages knowledge distillation (KD) and a novel decomposition of attentional representation to distill the learned representations from the teacher GT to a student MLP. During distillation, we incorporate a gated MLP architecture where two branches learn the decomposed attentional representation for a node while the third predicts node embeddings. Encoding the attentional representation mitigates the MLP’s over-reliance on node features, enabling robust performance even in inductive settings. Empirical results demonstrate that the proposed GraTeD-MLP has significantly faster inference time than the teacher GT model, with speed-up ranging from 20\texttimes -40\texttimes . With up to 25% improved performance over vanilla MLP. Furthermore, we empirically show that the proposed GraTeD-MLP outperforms other GNN distillation methods in seven datasets in both inductive and transductive settings
Cite
Text
Malik et al. "GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP." Proceedings of the Third Learning on Graphs Conference, 2025.Markdown
[Malik et al. "GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP." Proceedings of the Third Learning on Graphs Conference, 2025.](https://mlanthology.org/log/2025/malik2025log-gratedmlp/)BibTeX
@inproceedings{malik2025log-gratedmlp,
title = {{GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP}},
author = {Malik, Sarthak and Rai, Aditi and V, Ram Ganesh and Sehgal, Himank and Sethi, Akshay and Malhotra, Aakarsh},
booktitle = {Proceedings of the Third Learning on Graphs Conference},
year = {2025},
pages = {20:1-20:15},
volume = {269},
url = {https://mlanthology.org/log/2025/malik2025log-gratedmlp/}
}