Retrieval Augmented Zero-Shot Enzyme Generation for Specified Substrate
Abstract
Generating novel enzymes for target molecules in zero-shot scenarios is a fundamental challenge in biomaterial synthesis and chemical production. Without known enzymes for a target molecule, training generative models becomes difficult due to the lack of direct supervision. To address this, we propose a retrieval-augmented generation method that uses existing enzyme-substrate data to guide enzyme design. Our method retrieves enzymes with substrates that share structural similarities with the target molecule, leveraging functional similarities in catalytic activity. Since none of the retrieved enzymes directly catalyze the target molecule, we use a conditioned discrete diffusion model to generate new enzymes based on the retrieved examples. An enzyme-substrate relationship classifier guides the generation process to ensure optimal protein sequence distributions. We evaluate our model on enzyme design tasks with diverse real-world substrates and show that it outperforms existing protein generation methods in catalytic capability, foldability, and docking accuracy. Additionally, we define the zero-shot substrate-specified enzyme generation task and introduce a dataset with evaluation benchmarks.
Cite
Text
Du et al. "Retrieval Augmented Zero-Shot Enzyme Generation for Specified Substrate." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Du et al. "Retrieval Augmented Zero-Shot Enzyme Generation for Specified Substrate." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/du2025icml-retrieval/)BibTeX
@inproceedings{du2025icml-retrieval,
title = {{Retrieval Augmented Zero-Shot Enzyme Generation for Specified Substrate}},
author = {Du, Jiahe and Zhou, Kaixiong and Hong, Xinyu and Xu, Zhaozhuo and Xu, Jinbo and Huang, Xiao},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {14719-14734},
volume = {267},
url = {https://mlanthology.org/icml/2025/du2025icml-retrieval/}
}