Pre-Training via Denoising for Molecular Property Prediction

Abstract

Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. Relying on the well-known link between denoising autoencoders and score-matching, we show that the denoising objective corresponds to learning a molecular force field -- arising from approximating the Boltzmann distribution with a mixture of Gaussians -- directly from equilibrium structures. Our experiments demonstrate that using this pre-training objective significantly improves performance on multiple benchmarks, achieving a new state-of-the-art on the majority of targets in the widely used QM9 dataset. Our analysis then provides practical insights into the effects of different factors -- dataset sizes, model size and architecture, and the choice of upstream/downstream datasets -- on pre-training.

Cite

Text

Zaidi et al. "Pre-Training via Denoising for Molecular Property Prediction." NeurIPS 2022 Workshops: AI4Science, 2022.

Markdown

[Zaidi et al. "Pre-Training via Denoising for Molecular Property Prediction." NeurIPS 2022 Workshops: AI4Science, 2022.](https://mlanthology.org/neuripsw/2022/zaidi2022neuripsw-pretraining/)

BibTeX

@inproceedings{zaidi2022neuripsw-pretraining,
  title     = {{Pre-Training via Denoising for Molecular Property Prediction}},
  author    = {Zaidi, Sheheryar and Schaarschmidt, Michael and Martens, James and Kim, Hyunjik and Teh, Yee Whye and Sanchez-Gonzalez, Alvaro and Battaglia, Peter and Pascanu, Razvan and Godwin, Jonathan},
  booktitle = {NeurIPS 2022 Workshops: AI4Science},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/zaidi2022neuripsw-pretraining/}
}