Reverse Distillation: Consistently Scaling Protein Language Model Representations

Abstract

Unlike the predictable scaling laws in natural language processing and computer vision, protein language models (PLMs) scale poorly: for many tasks, models within the same family plateau or even decrease in performance, with mid-sized models often outperforming the largest in the family. We introduce Reverse Distillation a principled framework that decomposes large PLM representations into orthogonal subspaces guided by smaller models of the same family. The resulting embeddings have a nested, Matryoshka-style structure: the first $k$ dimensions of a larger model's embedding are exactly the representation from the smaller model. This ensures that larger reverse-distilled models consistently outperform smaller ones. A motivating intuition is that smaller models, constrained by capacity, preferentially encode broadly-shared protein features. Reverse distillation isolates these shared features and orthogonally extracts additional contributions from larger models, preventing interference between the two. On ProteinGym benchmarks, reverse-distilled ESM-2 variants outperform their respective baselines at the same embedding dimensionality, with the reverse-distilled 15 billion parameter model achieving the strongest performance. Our framework is generalizable to any model family where scaling challenges persist. Code and trained models are available at https://github.com/rohitsinghlab/plm_reverse_distillation.

Cite

Text

Catrina et al. "Reverse Distillation: Consistently Scaling Protein Language Model Representations." International Conference on Learning Representations, 2026.

Markdown

[Catrina et al. "Reverse Distillation: Consistently Scaling Protein Language Model Representations." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/catrina2026iclr-reverse/)

BibTeX

@inproceedings{catrina2026iclr-reverse,
  title     = {{Reverse Distillation: Consistently Scaling Protein Language Model Representations}},
  author    = {Catrina, Darius and Bepler, Christian and Sledzieski, Samuel and Singh, Rohit},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/catrina2026iclr-reverse/}
}