Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction

Abstract

Retrieving homologous protein sequences is essential for a broad range of protein modeling tasks such as fitness prediction, protein design, structure modeling, and protein-protein interactions. Traditional workflows have relied on a two-step process: first retrieving homologs via Multiple Sequence Alignments (MSA), then training models on one or more of these alignments. However, MSA-based retrieval is computationally expensive, struggles with highly divergent sequences and complex insertions/deletions, and operates independently of downstream modeling. We introduce Protriever, an end-to-end differentiable framework that unifies retrieval and task modeling. Focusing on protein fitness prediction, we show that Protriever achieves performance on par with the most sensitive MSA-based tools while being orders of magnitude faster at retrieval, as it relies on efficient vector search. Protriever is both architecture- and task-agnostic, and can flexibly adapt to different retrieval strategies and protein databases at inference -- offering a scalable alternative to alignment-centric approaches.

Cite

Text

Weitzman et al. "Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction." ICLR 2025 Workshops: LMRL, 2025.

Markdown

[Weitzman et al. "Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction." ICLR 2025 Workshops: LMRL, 2025.](https://mlanthology.org/iclrw/2025/weitzman2025iclrw-protriever-a/)

BibTeX

@inproceedings{weitzman2025iclrw-protriever-a,
  title     = {{Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction}},
  author    = {Weitzman, Ruben and Groth, Peter Mørch and Otani, Aoi and Marks, Debora Susan and Gal, Yarin and Notin, Pascal},
  booktitle = {ICLR 2025 Workshops: LMRL},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/weitzman2025iclrw-protriever-a/}
}