Rethinking the Setting of Semi-Supervised Learning on Graphs

Abstract

We argue that the present setting of semisupervised learning on graphs may result in unfair comparisons, due to its potential risk of over-tuning hyper-parameters for models. In this paper, we highlight the significant influence of tuning hyper-parameters, which leverages the label information in the validation set to improve the performance. To explore the limit of over-tuning hyperparameters, we propose ValidUtil, an approach to fully utilize the label information in the validation set through an extra group of hyper-parameters. With ValidUtil, even GCN can easily get high accuracy of 85.8% on Cora. To avoid over-tuning, we merge the training set and the validation set and construct an i.i.d. graph benchmark (IGB) consisting of 4 datasets. Each dataset contains 100 i.i.d. graphs sampled from a large graph to reduce the evaluation variance. Our experiments suggest that IGB is a more stable benchmark than previous datasets for semisupervised learning on graphs. Our code and data are released at https://github.com/THUDM/IGB/.

Cite

Text

Li et al. "Rethinking the Setting of Semi-Supervised Learning on Graphs." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/450

Markdown

[Li et al. "Rethinking the Setting of Semi-Supervised Learning on Graphs." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/li2022ijcai-rethinking/) doi:10.24963/IJCAI.2022/450

BibTeX

@inproceedings{li2022ijcai-rethinking,
  title     = {{Rethinking the Setting of Semi-Supervised Learning on Graphs}},
  author    = {Li, Ziang and Ding, Ming and Li, Weikai and Wang, Zihan and Zeng, Ziyu and Cen, Yukuo and Tang, Jie},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {3243-3249},
  doi       = {10.24963/IJCAI.2022/450},
  url       = {https://mlanthology.org/ijcai/2022/li2022ijcai-rethinking/}
}