FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning
Abstract
Accurate protein representations that integrate sequence and three-dimensional (3D) structure are critical to many biological and biomedical tasks. Most existing models either ignore structure or combine it with sequence through a single, static fusion step. Here we present FusionProt, a unified model that learns representations via iterative, bidirectional fusion between a protein language model and a structure encoder. A single learnable token serves as a carrier, alternating between sequence attention and spatial message passing across layers. FusionProt is evaluated on Enzyme Commission (EC), Gene Ontology (GO), and mutation stability prediction tasks. It improves F\textsubscript{max} by a median of $+1.3$ points (up to $+2.0$) across EC and GO benchmarks, and boosts AUROC by $+3.6$ points over the strongest baseline on mutation stability. Inference cost remains practical, with only $\sim2\text{--}5\%$ runtime overhead. Beyond state-of-the-art performance, we further demonstrate FusionProt’s practical relevance through representative biological case studies, suggesting that the model captures biologically relevant features.
Cite
Text
Kalifa et al. "FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning." Transactions on Machine Learning Research, 2025.Markdown
[Kalifa et al. "FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/kalifa2025tmlr-fusionprot/)BibTeX
@article{kalifa2025tmlr-fusionprot,
title = {{FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning}},
author = {Kalifa, Dan and Singer, Uriel and Radinsky, Kira},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/kalifa2025tmlr-fusionprot/}
}