MolGen-Transformer: An Open-Source Self-Supervised Model for Molecular Generation and Latent Space Exploration
Abstract
We present the MolGen-Transformer, a generative AI model achieving 100% reconstruction accuracy through self-supervised training using a large, curated meta-dataset of organic molecules with less than 168 atoms. MolGen-Transformer produces valid molecular structures using the SELF-referencing Embedded Strings (SELFIES) representation. Our training dataset comprises 198 million organic molecules, selected to encompass a wide range of organic structures. We illustrate the generative capability of this model in three ways: (a) Generating chemically similar molecules, where the model creates structurally similar valid molecules to a given prompt molecule; (b) Producing Diverse Molecules, where the model creates structurally diverse valid molecules given a random latent seed, and (c) Identifying Chemical Intermediates, where the model creates a sequence of valid molecules connecting two given molecules. MolGen-Transformer allows the generation and exploration of structurally similar molecules and provides insights into structural pathways between molecules. The model weights and inference methods are publicly available to support community use. We also provide an easy-to-use website for exploration.
Cite
Text
Yang et al. "MolGen-Transformer: An Open-Source Self-Supervised Model for Molecular Generation and Latent Space Exploration." NeurIPS 2024 Workshops: AI4Mat, 2024.Markdown
[Yang et al. "MolGen-Transformer: An Open-Source Self-Supervised Model for Molecular Generation and Latent Space Exploration." NeurIPS 2024 Workshops: AI4Mat, 2024.](https://mlanthology.org/neuripsw/2024/yang2024neuripsw-molgentransformer/)BibTeX
@inproceedings{yang2024neuripsw-molgentransformer,
title = {{MolGen-Transformer: An Open-Source Self-Supervised Model for Molecular Generation and Latent Space Exploration}},
author = {Yang, Chih-Hsuan and Duke, Rebekah and Sornberger, Parker Delaney and Ogbaje, Moses and Risko, Chad and Ganapathysubramanian, Baskar},
booktitle = {NeurIPS 2024 Workshops: AI4Mat},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/yang2024neuripsw-molgentransformer/}
}