Model-Agnostic Graph Dataset Compression with the Tree Mover’s Distance

Abstract

Graph neural networks have demonstrated remarkable success across a variety of domains. However, the acquisition and management of largescale graph datasets poses several challenges. Acquiring graph-level labels can be prohibitively costly, especially for applications in the biosciences and combinatorial optimization. Storage and privacy constraints can pose additional challenges. In this work, we propose an approach for data subset selection for graph datasets, which downsamples graphs and nodes based on the Tree Mover’s Distance. We provide new efficient methods for computing the TMD in our setting; empirical results showing our approach outperforms other node and graph sampling methods; and theoretical results bounding the decrease in accuracy caused by training on the downsampled graphs. Surprisingly, we find that with our method, we can subsample down to 1% of the number of graphs and 10% of the number of nodes on some datasets, with minimal degradation in model accuracy.

Cite

Text

Jain et al. "Model-Agnostic Graph Dataset Compression with the Tree Mover’s Distance." ICML 2024 Workshops: WANT, 2024.

Markdown

[Jain et al. "Model-Agnostic Graph Dataset Compression with the Tree Mover’s Distance." ICML 2024 Workshops: WANT, 2024.](https://mlanthology.org/icmlw/2024/jain2024icmlw-modelagnostic/)

BibTeX

@inproceedings{jain2024icmlw-modelagnostic,
  title     = {{Model-Agnostic Graph Dataset Compression with the Tree Mover’s Distance}},
  author    = {Jain, Mika Sarkin and Jegelka, Stefanie and Karmarkar, Ishani and Ruiz, Luana and Vitercik, Ellen},
  booktitle = {ICML 2024 Workshops: WANT},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/jain2024icmlw-modelagnostic/}
}