Relieving the Over-Aggregating Effect in Graph Transformers

Abstract

Graph attention has demonstrated superior performance in graph learning tasks. However, learning from global interactions can be challenging due to the large number of nodes. In this paper, we discover a new phenomenon termed over-aggregating. Over-aggregating arises when a large volume of messages is aggregated into a single node with less discrimination, leading to the dilution of the key messages and potential information loss. To address this, we propose Wideformer, a plug-and-play method for graph attention. Wideformer divides the aggregation of all nodes into parallel processes and guides the model to focus on specific subsets of these processes. The division can limit the input volume per aggregation, avoiding message dilution and reducing information loss. The guiding step sorts and weights the aggregation outputs, prioritizing the informative messages. Evaluations show that Wideformer can effectively mitigate over-aggregating. As a result, the backbone methods can focus on the informative messages, achieving superior performance compared to baseline methods.

Cite

Text

Sun et al. "Relieving the Over-Aggregating Effect in Graph Transformers." Advances in Neural Information Processing Systems, 2025.

Markdown

[Sun et al. "Relieving the Over-Aggregating Effect in Graph Transformers." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/sun2025neurips-relieving/)

BibTeX

@inproceedings{sun2025neurips-relieving,
  title     = {{Relieving the Over-Aggregating Effect in Graph Transformers}},
  author    = {Sun, Junshu and Chang, Wanxing and Yang, Chenxue and Huang, Qingming and Wang, Shuhui},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/sun2025neurips-relieving/}
}