Molecular Generation with State Space Sequence Models

Abstract

Molecular generation is a critical task in drug discovery but current approaches often struggle with efficiency and scalability when dealing with complex molecular structures. This paper aims to address these challenges by training and evaluating models for molecular generation using the MAMBA State Space Model architecture. We develop models with 20M and 90M parameters trained on the MOSES and ZINC datasets, respectively, using the Sequential Attachment-based Fragment Embedding (SAFE) representation. We compare MAMBA models against the prevailing Transformer architecture in terms of generation quality and computational efficiency. Our findings suggest that MAMBA models can achieve performance comparable to Transformers in generating valid, unique, and diverse molecules. Generation from both architectures can achieve close to perfect validity and uniqueness scores, although MAMBA models require more conservative sampling parameters or regeneration steps to achieve these results. MAMBA models consistently demonstrates lower perplexity and reduced GPU power consumption (up to 30\% reduction) compared to Transformer models. These results indicate that State Space Models may offer a computationally efficient alternative for molecular generation tasks, potentially enabling more efficient processing of larger datasets and complex molecular structures. The efficiency gains of MAMBA models become more pronounced with longer sequences, suggesting that this architecture could enable the modeling and generation of more complex molecules. This capability could significantly expand the scope of AI-driven molecular design in drug discovery.

Cite

Text

Lombard et al. "Molecular Generation with State Space Sequence Models." NeurIPS 2024 Workshops: AIDrugX, 2024.

Markdown

[Lombard et al. "Molecular Generation with State Space Sequence Models." NeurIPS 2024 Workshops: AIDrugX, 2024.](https://mlanthology.org/neuripsw/2024/lombard2024neuripsw-molecular/)

BibTeX

@inproceedings{lombard2024neuripsw-molecular,
  title     = {{Molecular Generation with State Space Sequence Models}},
  author    = {Lombard, Anri and Acton, Shane and Sob, Ulrich Armel Mbou and Buys, Jan},
  booktitle = {NeurIPS 2024 Workshops: AIDrugX},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/lombard2024neuripsw-molecular/}
}