Large Language Model Is Secretly a Protein Sequence Optimizer

Abstract

We consider the protein sequence engineering problem, which aims to find protein sequences with high fitness levels, starting from a given wild-type sequence. Directed evolution has been a dominating paradigm in this field which has an iterative process to generate variants and select via experimental feedback. We demonstrate large language models (LLMs), despite being trained on massive texts, are secretly protein sequence optimizers. With a directed evolutionary method, LLM can perform protein engineering through Pareto and experiment-budget constrained optimization, demonstrating success on both synthetic and experimental fitness landscapes.

Cite

Text

Wang et al. "Large Language Model Is Secretly a Protein Sequence Optimizer." ICLR 2025 Workshops: LMRL, 2025.

Markdown

[Wang et al. "Large Language Model Is Secretly a Protein Sequence Optimizer." ICLR 2025 Workshops: LMRL, 2025.](https://mlanthology.org/iclrw/2025/wang2025iclrw-large/)

BibTeX

@inproceedings{wang2025iclrw-large,
  title     = {{Large Language Model Is Secretly a Protein Sequence Optimizer}},
  author    = {Wang, Yinkai and He, Jiaxing and Du, Yuanqi and Chen, Xiaohui and Li, Jianan Canal and Liu, Liping and Xu, Xiaolin and Hassoun, Soha},
  booktitle = {ICLR 2025 Workshops: LMRL},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/wang2025iclrw-large/}
}