Probing the Embedding Space of Protein Foundation Models Through Intrinsic Dimension Analysis
Abstract
Protein foundation models produce embeddings that are valuable for various downstream tasks, yet the structure and information content of these embeddings remain poorly understood, particularly in relation to diverse pre-training tasks and input modalities. We apply intrinsic dimension ($I_d$) analysis to quantify the complexity of protein embeddings from several widely used models, including ESM-2, ESM-IF, ProstT5, and ProteinMPNN. We also employ $I_d$ correlation ($I_d$Cor) to measure the shared information between different embeddings. Our results reveal a universality in protein embeddings, with similar $I_d$ scales across models and strong correlations between protein and residue embeddings. We observe significant redundancy, with $I_d$ values much smaller than the original embedding dimensions. We also show that models capture both spatial and sequential long-range correlation, with correlation decay rate differing based on the input modalities and pre-training tasks. Lastly, we analyze mutant embeddings, revealing that mutations cluster effectively by site, and fine-tuning further reduces the $I_d$ to capture task-specific representations.
Cite
Text
Yang et al. "Probing the Embedding Space of Protein Foundation Models Through Intrinsic Dimension Analysis." NeurIPS 2024 Workshops: AIDrugX, 2024.Markdown
[Yang et al. "Probing the Embedding Space of Protein Foundation Models Through Intrinsic Dimension Analysis." NeurIPS 2024 Workshops: AIDrugX, 2024.](https://mlanthology.org/neuripsw/2024/yang2024neuripsw-probing/)BibTeX
@inproceedings{yang2024neuripsw-probing,
title = {{Probing the Embedding Space of Protein Foundation Models Through Intrinsic Dimension Analysis}},
author = {Yang, Soojung and Nam, Juno and Perez, Tynan and Song, Jinyeop and Du, Xiaochen and Gomez-Bombarelli, Rafael},
booktitle = {NeurIPS 2024 Workshops: AIDrugX},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/yang2024neuripsw-probing/}
}