MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning

Abstract

We propose MiniMol, an open-source foundation model for molecular machine learning which outperforms the best previous foundation model on 17/22 downstream tasks from the Therapeutic Data Commons (TDC) ADMET group while having ten times fewer parameters. This efficiency is achieved through the use of a graph neural network (GNN), pre-trained on about 3,300 sparsely defined graph- and node-level tasks, using a dataset of 6 million molecules and 500 million quantum and biological labels. The model learns via multi-task, multi-level supervised training to produce embeddings that generalize well to a wide range of biological tasks, and that can be efficiently used by simple Multi-Layer Perceptron (MLP) models for the downstream task, as demonstrated by our experiments.

Cite

Text

Klaser et al. "MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning." ICML 2024 Workshops: AccMLBio, 2024.

Markdown

[Klaser et al. "MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning." ICML 2024 Workshops: AccMLBio, 2024.](https://mlanthology.org/icmlw/2024/klaser2024icmlw-minimol/)

BibTeX

@inproceedings{klaser2024icmlw-minimol,
  title     = {{MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning}},
  author    = {Klaser, Kerstin and Banaszewski, Blazej and Maddrell-Mander, Samuel and McLean, Callum and Müller, Luis and Parviz, Ali and Huang, Shenyang and Fitzgibbon, Andrew W},
  booktitle = {ICML 2024 Workshops: AccMLBio},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/klaser2024icmlw-minimol/}
}