Do Protein Transformers Have Biological Intelligence?

Abstract

Deep neural networks, particularly Transformers, have been widely adopted for predicting the functional properties of proteins. In this work, we focus on exploring whether Protein Transformers can capture biological intelligence among protein sequences. To achieve our goal, we first introduce a protein function dataset, namely Protein-FN , providing over 9000 protein data with meaningful labels. Second, we devise a new Transformer architecture, namely Sequence Protein Transformers (SPT) , for computationally efficient protein function predictions. Third, we develop a novel Explainable Artificial Intelligence (XAI) technique called Sequence Score , which can efficiently interpret the decision-making processes of protein models, thereby overcoming the difficulty of deciphering biological intelligence bided in Protein Transformers. Remarkably, even our smallest SPT-Tiny model, which contains only 5.4M parameters, demonstrates impressive predictive accuracy, achieving $94.3 \%$ 94.3 % on the Antibiotic Resistance (AR) dataset and $99.6 \%$ 99.6 % on the Protein-FN dataset, all accomplished by training from scratch. Besides, our Sequence Score technique helps reveal that our SPT models can discover several meaningful patterns underlying the sequence structures of protein data, with these patterns aligning closely with the domain knowledge in the biology community. We have officially released our Protein-FN dataset on Hugging Face Datasets https://huggingface.co/datasets/Protein-FN/Protein-FN . Our code is available at https://github.com/fudong03/BioIntelligence .

Cite

Text

Lin et al. "Do Protein Transformers Have Biological Intelligence?." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06118-8_22

Markdown

[Lin et al. "Do Protein Transformers Have Biological Intelligence?." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/lin2025ecmlpkdd-protein/) doi:10.1007/978-3-032-06118-8_22

BibTeX

@inproceedings{lin2025ecmlpkdd-protein,
  title     = {{Do Protein Transformers Have Biological Intelligence?}},
  author    = {Lin, Fudong and Du, Wanrou and Liu, Jinchan and Milon, Tarikul I. and Meche, Shelby and Xu, Wu and Qin, Xiaoqi and Yuan, Xu},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {373-390},
  doi       = {10.1007/978-3-032-06118-8_22},
  url       = {https://mlanthology.org/ecmlpkdd/2025/lin2025ecmlpkdd-protein/}
}