GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP

Sarthak Malik, Aditi Rai, Ram Ganesh V, Himank Sehgal, Akshay Sethi, Aakarsh Malhotra

LoG 2025 pp. 20:1-20:15

/log/2025/malik2025log-gratedmlp/

Abstract

Graph Transformers (GTs) like NAGphormer have shown impressive performance by encoding graph’s structural information and node features. However, their self-attention and complex architectures require high computation and memory, hindering their deployment. Thus, we propose a novel framework called Graph Transformer Distillation to Multi-Layer Perceptron (GraTeD-MLP). GraTeD-MLP leverages knowledge distillation (KD) and a novel decomposition of attentional representation to distill the learned representations from the teacher GT to a student MLP. During distillation, we incorporate a gated MLP architecture where two branches learn the decomposed attentional representation for a node while the third predicts node embeddings. Encoding the attentional representation mitigates the MLP’s over-reliance on node features, enabling robust performance even in inductive settings. Empirical results demonstrate that the proposed GraTeD-MLP has significantly faster inference time than the teacher GT model, with speed-up ranging from 20\texttimes -40\texttimes . With up to 25% improved performance over vanilla MLP. Furthermore, we empirically show that the proposed GraTeD-MLP outperforms other GNN distillation methods in seven datasets in both inductive and transductive settings

PDF LoG OpenReview Semantic Scholar

Cite

Text

Malik et al. "GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP." Proceedings of the Third Learning on Graphs Conference, 2025.

Markdown

[Malik et al. "GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP." Proceedings of the Third Learning on Graphs Conference, 2025.](https://mlanthology.org/log/2025/malik2025log-gratedmlp/)

BibTeX

@inproceedings{malik2025log-gratedmlp,
  title     = {{GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP}},
  author    = {Malik, Sarthak and Rai, Aditi and V, Ram Ganesh and Sehgal, Himank and Sethi, Akshay and Malhotra, Aakarsh},
  booktitle = {Proceedings of the Third Learning on Graphs Conference},
  year      = {2025},
  pages     = {20:1-20:15},
  volume    = {269},
  url       = {https://mlanthology.org/log/2025/malik2025log-gratedmlp/}
}