AlphaFold Meets Flow Matching for Generating Protein Ensembles
Abstract
The biological functions of proteins often depend on dynamic structural ensembles. In this work, we develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins. We repurpose highly accurate single-state predictors such as AlphaFold and ESMFold and fine-tune them under a custom flow matching framework to obtain sequence-conditioned generative models of protein structure called AlphaFlow and ESMFlow. When trained and evaluated on the PDB, our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling. When further trained on ensembles from all-atom MD, our method accurately captures conformational flexibility, positional distributions, and higher-order ensemble observables for unseen proteins. Moreover, our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories, demonstrating its potential as a proxy for expensive physics-based simulations. Code is available at https://github.com/bjing2016/alphaflow.
Cite
Text
Jing et al. "AlphaFold Meets Flow Matching for Generating Protein Ensembles." International Conference on Machine Learning, 2024.Markdown
[Jing et al. "AlphaFold Meets Flow Matching for Generating Protein Ensembles." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/jing2024icml-alphafold/)BibTeX
@inproceedings{jing2024icml-alphafold,
title = {{AlphaFold Meets Flow Matching for Generating Protein Ensembles}},
author = {Jing, Bowen and Berger, Bonnie and Jaakkola, Tommi},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {22277-22303},
volume = {235},
url = {https://mlanthology.org/icml/2024/jing2024icml-alphafold/}
}