GenMol: A Drug Discovery Generalist with Discrete Diffusion
Abstract
Drug discovery is a complex process that involves multiple stages and tasks. However, existing molecular generative models can only tackle some of these tasks. We present Generalist Molecular generative model (GenMol), a versatile framework that uses only a single discrete diffusion model to handle diverse drug discovery scenarios. GenMol generates Sequential Attachment-based Fragment Embedding (SAFE) sequences through non-autoregressive bidirectional parallel decoding, thereby allowing the utilization of a molecular context that does not rely on the specific token ordering while having better sampling efficiency. GenMol uses fragments as basic building blocks for molecules and introduces fragment remasking, a strategy that optimizes molecules by regenerating masked fragments, enabling effective exploration of chemical space. We further propose molecular context guidance (MCG), a guidance method tailored for masked discrete diffusion of GenMol. GenMol significantly outperforms the previous GPT-based model in de novo generation and fragment-constrained generation, and achieves state-of-the-art performance in goal-directed hit generation and lead optimization. These results demonstrate that GenMol can tackle a wide range of drug discovery tasks, providing a unified and versatile approach for molecular design.
Cite
Text
Lee et al. "GenMol: A Drug Discovery Generalist with Discrete Diffusion." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Lee et al. "GenMol: A Drug Discovery Generalist with Discrete Diffusion." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/lee2025icml-genmol/)BibTeX
@inproceedings{lee2025icml-genmol,
title = {{GenMol: A Drug Discovery Generalist with Discrete Diffusion}},
author = {Lee, Seul and Kreis, Karsten and Veccham, Srimukh Prasad and Liu, Meng and Reidenbach, Danny and Peng, Yuxing and Paliwal, Saee Gopal and Nie, Weili and Vahdat, Arash},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {33205-33226},
volume = {267},
url = {https://mlanthology.org/icml/2025/lee2025icml-genmol/}
}