Spectro: A Multi-Modal Approach for Molecule Elucidation Using IR and NMR Data

Abstract

Molecular structure elucidation is a crucial but fundamentally challenging step in the characterization of materials given the large number of possible structures. Here, we introduce Spectro, an innovative multi-modal approach for molecular elucidation that combines $^{13}\ce{C}$ and $^{1}\ce{H}$ NMR data with IR. Spectro translates the embedded representations of the spectra into molecular structures using the SELFIES notation. We employed a vision model for the embedded representation of the IR data, which was pretrained to detect relevant functional group peaks in the IR spectra achieving an F1 score of 91\%. For NMR data, we utilized LLM2Vec, treating the NMR spectra as text. This integration of multiple spectroscopic techniques allows Spectro to achieve an overall test accuracy of 93\% when trained jointly with the vision model for the IR spectra, and 82\% when trained with fixed embeddings. Our approach demonstrates the potential of multi-modal learning in tackling complex molecular characterization tasks.

Cite

Text

Chacko et al. "Spectro: A Multi-Modal Approach for Molecule Elucidation Using IR and NMR Data." NeurIPS 2024 Workshops: AI4Mat, 2024.

Markdown

[Chacko et al. "Spectro: A Multi-Modal Approach for Molecule Elucidation Using IR and NMR Data." NeurIPS 2024 Workshops: AI4Mat, 2024.](https://mlanthology.org/neuripsw/2024/chacko2024neuripsw-spectro/)

BibTeX

@inproceedings{chacko2024neuripsw-spectro,
  title     = {{Spectro: A Multi-Modal Approach for Molecule Elucidation Using IR and NMR Data}},
  author    = {Chacko, Edwin and Sondhi, Rudra and Praveen, Arnav and Luska, Kylie L. and Vargas-Hernandez, Rodrigo},
  booktitle = {NeurIPS 2024 Workshops: AI4Mat},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/chacko2024neuripsw-spectro/}
}