Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction

Abstract

Building $\textit{in silico}$ models to predict chemical properties and activities is a crucial step in drug discovery. However, limited labeled data often hinders the application of deep learning in this setting. Meanwhile advances in meta-learning have enabled state-of-the-art performances in few-shot learning benchmarks, naturally prompting the question: Can meta-learning improve deep learning performance in low-resource drug discovery projects? In this work, we assess the transferability of graph neural networks initializations learned by the Model-Agnostic Meta-Learning (MAML) algorithm - and its variants FO-MAML and ANIL - for chemical properties and activities tasks. Using the ChEMBL20 dataset to emulate low-resource settings, our benchmark shows that meta-initializations perform comparably to or outperform multi-task pre-training baselines on 16 out of 20 in-distribution tasks and on all out-of-distribution tasks, providing an average improvement in AUPRC of 11.2% and 26.9% respectively. Finally, we observe that meta-initializations consistently result in the best performing models across fine-tuning sets with $k \in \{16, 32, 64, 128, 256\}$ instances.

Cite

Text

Nguyen et al. "Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction." ICML 2020 Workshops: LifelongML, 2020.

Markdown

[Nguyen et al. "Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction." ICML 2020 Workshops: LifelongML, 2020.](https://mlanthology.org/icmlw/2020/nguyen2020icmlw-metalearning/)

BibTeX

@inproceedings{nguyen2020icmlw-metalearning,
  title     = {{Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction}},
  author    = {Nguyen, Cuong Q. and Kreatsoulas, Constantine and Branson, Kim M.},
  booktitle = {ICML 2020 Workshops: LifelongML},
  year      = {2020},
  url       = {https://mlanthology.org/icmlw/2020/nguyen2020icmlw-metalearning/}
}