Mol-AE: Auto-Encoder Based Molecular Representation Learning with 3D Cloze Test Objective

Abstract

3D molecular representation learning has gained tremendous interest and achieved promising performance in various downstream tasks. A series of recent approaches follow a prevalent framework: an encoder-only model coupled with a coordinate denoising objective. However, through a series of analytical experiments, we prove that the encoder-only model with coordinate denoising objective exhibits inconsistency between pre-training and downstream objectives, as well as issues with disrupted atomic identifiers. To address these two issues, we propose Mol-AE for molecular representation learning, an auto-encoder model using positional encoding as atomic identifiers. We also propose a new training objective named 3D Cloze Test to make the model learn better atom spatial relationships from real molecular substructures. Empirical results demonstrate that Mol-AE achieves a large margin performance gain compared to the current state-of-the-art 3D molecular modeling approach.

Cite

Text

Yang et al. "Mol-AE: Auto-Encoder Based Molecular Representation Learning with 3D Cloze Test Objective." International Conference on Machine Learning, 2024.

Markdown

[Yang et al. "Mol-AE: Auto-Encoder Based Molecular Representation Learning with 3D Cloze Test Objective." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/yang2024icml-molae/)

BibTeX

@inproceedings{yang2024icml-molae,
  title     = {{Mol-AE: Auto-Encoder Based Molecular Representation Learning with 3D Cloze Test Objective}},
  author    = {Yang, Junwei and Zheng, Kangjie and Long, Siyu and Nie, Zaiqing and Zhang, Ming and Dai, Xinyu and Ma, Wei-Ying and Zhou, Hao},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {56793-56811},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/yang2024icml-molae/}
}