MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training
Abstract
Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high-quality MSA. Although various methods have been proposed to generate high-quality MSA under these conditions, they fall short in comprehensively capturing the intricate co-evolutionary patterns within MSA or require guidance from external oracle models. Here we introduce MSAGPT, a novel approach to prompt protein structure predictions via MSA generative pre-training in a low-MSA regime. MSAGPT employs a simple yet effective 2D evolutionary positional encoding scheme to model the complex evolutionary patterns. Endowed by this, the flexible 1D MSA decoding framework facilitates zero- or few-shot learning. Moreover, we demonstrate leveraging the feedback from AlphaFold2 (AF2) can further enhance the model’s capacity via Rejective Fine-tuning (RFT) and Reinforcement Learning from AF2 Feedback (RLAF). Extensive experiments confirm the efficacy of MSAGPT in generating faithful and informative MSA (up to +8.5% TM-Score on few-shot scenarios). The transfer learning also demonstrates its great potential for the wide range of tasks resorting to the quality of MSA.
Cite
Text
Chen et al. "MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training." Neural Information Processing Systems, 2024. doi:10.52202/079017-1184Markdown
[Chen et al. "MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/chen2024neurips-msagpt/) doi:10.52202/079017-1184BibTeX
@inproceedings{chen2024neurips-msagpt,
title = {{MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training}},
author = {Chen, Bo and Bei, Zhilei and Cheng, Xingyi and Li, Pan and Tang, Jie and Song, Le},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1184},
url = {https://mlanthology.org/neurips/2024/chen2024neurips-msagpt/}
}