Learning Invariant Molecular Representation in Latent Discrete Space

Abstract

Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD.

Cite

Text

Zhuang et al. "Learning Invariant Molecular Representation in Latent Discrete Space." Neural Information Processing Systems, 2023.

Markdown

[Zhuang et al. "Learning Invariant Molecular Representation in Latent Discrete Space." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/zhuang2023neurips-learning/)

BibTeX

@inproceedings{zhuang2023neurips-learning,
  title     = {{Learning Invariant Molecular Representation in Latent Discrete Space}},
  author    = {Zhuang, Xiang and Zhang, Qiang and Ding, Keyan and Bian, Yatao and Wang, Xiao and Lv, Jingsong and Chen, Hongyang and Chen, Huajun},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/zhuang2023neurips-learning/}
}