Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck

Abstract

This study aims to build a pre-trained Graph Neural Network (GNN) model on molecules without human annotations or prior knowledge. Although various attempts have been proposed to overcome limitations in acquiring labeled molecules, the previous pre-training methods still rely on semantic subgraphs, i.e., functional groups. Only focusing on the functional groups could overlook the graph-level distinctions. The key challenge to build a pre-trained GNN on molecules is how to (1) generate well-distinguished graph-level representations and (2) automatically discover the functional groups without prior knowledge. To solve it, we propose a novel Subgraph-conditioned Graph Information Bottleneck, named S-CGIB, for pre-training GNNs to recognize core subgraphs (graph cores) and significant subgraphs. The main idea is that the graph cores contain compressed and sufficient information that could generate well-distinguished graph-level representations and reconstruct the input graph conditioned on significant subgraphs across molecules under the S-CGIB principle. To discover significant subgraphs without prior knowledge about functional groups, we propose generating a set of functional group candidates, i.e., ego networks, and using an attention-based interaction between the graph core and the candidates. Despite being identified from self-supervised learning, our learned subgraphs match the real-world functional groups. Extensive experiments on molecule datasets across various domains demonstrate the superiority of S-CGIB.

Cite

Text

Hoang and Lee. "Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I16.33891

Markdown

[Hoang and Lee. "Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/hoang2025aaai-pre/) doi:10.1609/AAAI.V39I16.33891

BibTeX

@inproceedings{hoang2025aaai-pre,
  title     = {{Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck}},
  author    = {Hoang, Van Thuy and Lee, O-Joun},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {17204-17213},
  doi       = {10.1609/AAAI.V39I16.33891},
  url       = {https://mlanthology.org/aaai/2025/hoang2025aaai-pre/}
}