Residue-Level Text Conditioning for Protein Language Model Mutation Effect Prediction

Abstract

To augment protein sequence models with language, we introduce Conditioning on Residue-level Annotations from TExt (CRATE), a fine-tuning method that fuses two models using feature-wise linear modulation. We fine-tune protein language models at a large scale, first constructing a dataset (CRATE-train) joining annotations from InterPro and UniProtKB with sequences from UniRef90 resulting in approximately 105 million sequences each with at least three annotations and nearly 100\% sequence coverage on average. Applying CRATE to mutation effect prediction improves performance on the ProteinGym over prior benchmarks. Leveraging these improvements, we show CRATE can be used to select annotations with the largest positive impact on mutation effect prediction and estimate the deep mutational scan (DMS) scores tested over multiple different assay selection types.

Cite

Text

Berenberg et al. "Residue-Level Text Conditioning for Protein Language Model Mutation Effect Prediction." ICLR 2025 Workshops: GEM, 2025.

Markdown

[Berenberg et al. "Residue-Level Text Conditioning for Protein Language Model Mutation Effect Prediction." ICLR 2025 Workshops: GEM, 2025.](https://mlanthology.org/iclrw/2025/berenberg2025iclrw-residuelevel/)

BibTeX

@inproceedings{berenberg2025iclrw-residuelevel,
  title     = {{Residue-Level Text Conditioning for Protein Language Model Mutation Effect Prediction}},
  author    = {Berenberg, Dan and Gruver, Nate and Amin, Alan Nawzad and Groth, Peter Mørch and Chen, Leo and Srivastava, Harsh R. and Notin, Pascal and Marks, Debora Susan and Wilson, Andrew Gordon and Cho, Kyunghyun and Bonneau, Richard},
  booktitle = {ICLR 2025 Workshops: GEM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/berenberg2025iclrw-residuelevel/}
}