Deep Interactions for Multimodal Molecular Property Prediction

Abstract

Multi-modal learning by means of leveraging both 2D graph and 3D point cloud information has become a prevalent method to improve model performance in molecular property prediction. However, many recent techniques focus on specific pre-training tasks such as contrastive learning, feature blending, and atom/subgraph masking in order to learn multi-modality even though design of model architecture is also impactful for both pre-training and downstream task performance. Relying on pre-training tasks to align 2D and 3D modalities lacks direct interaction which may be more effective in multimodal learning. In this work, we propose MolInteract, which takes a simple yet effective architecture-focused approach to multimodal molecule learning which addresses these challenges. MolInteract leverages an interaction layer for fusing 2D and 3D information and fostering cross-modal alignment, showing strong results using even the simplest pre-training methods such as predicting features of the 3D point cloud and 2D graph. MolInteract exceeds several current state-of-the-art multimodal pre-training techniques and architectures on various downstream 2D and 3D molecule property prediction benchmark tasks.

Cite

Text

Soga et al. "Deep Interactions for Multimodal Molecular Property Prediction." NeurIPS 2024 Workshops: AIDrugX, 2024.

Markdown

[Soga et al. "Deep Interactions for Multimodal Molecular Property Prediction." NeurIPS 2024 Workshops: AIDrugX, 2024.](https://mlanthology.org/neuripsw/2024/soga2024neuripsw-deep/)

BibTeX

@inproceedings{soga2024neuripsw-deep,
  title     = {{Deep Interactions for Multimodal Molecular Property Prediction}},
  author    = {Soga, Patrick and Lei, Zhenyu and Bilodeau, Camille L. and Li, Jundong},
  booktitle = {NeurIPS 2024 Workshops: AIDrugX},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/soga2024neuripsw-deep/}
}