Towards Protein Sequence & Structure Co-Design with Multi-Modal Language Models

Abstract

Proteins perform diverse biological functions, governed by the intricate relationship between their sequence and three-dimensional structure. While protein language models (PLMs) have demonstrated remarkable success in functional annotation and structure prediction, their potential for sequence-structure co-design remains underexplored. This limitation arises from pre-training objectives that favor masked token prediction over generative modeling. In this work, we systematically explore sampling strategies to enhance the generative capabilities of PLMs for co-design. Notably, we introduce a ranked iterative decoding with re-masking scheme, enabling PLMs to generate sequences and structures more effectively. Benchmarking ESM3 across multiple scales, we demonstrate that using PLMs effectively at sampling time for co-design tasks can outperform specialized architectures that lack comparable scaling properties. Our work advances the field of computational protein design by equipping PLMs with robust generative capabilities tailored to sequence-structure interdependence.

Cite

Text

Lu et al. "Towards Protein Sequence & Structure Co-Design with Multi-Modal Language Models." ICLR 2025 Workshops: GEM, 2025.

Markdown

[Lu et al. "Towards Protein Sequence & Structure Co-Design with Multi-Modal Language Models." ICLR 2025 Workshops: GEM, 2025.](https://mlanthology.org/iclrw/2025/lu2025iclrw-protein/)

BibTeX

@inproceedings{lu2025iclrw-protein,
  title     = {{Towards Protein Sequence & Structure Co-Design with Multi-Modal Language Models}},
  author    = {Lu, Stephen Zhewen and Lu, Jiarui and Guo, Hongyu and Tang, Jian},
  booktitle = {ICLR 2025 Workshops: GEM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/lu2025iclrw-protein/}
}