DARKIN: A Zero-Shot Classification Benchmark and an Evaluation of Protein Language Models

Abstract

Protein language models (pLMs) aim to capture the complex information embedded within protein sequences and are useful for downstream protein prediction tasks. With a plethora of pLMs available, there is now a critical need to benchmark their performance across diverse tasks. Here, we introduce a biologically relevant zero-shot prediction benchmark, focusing on dark kinase-phosphosite associations. Kinases are the enzymes responsible for protein phosphorylation and they play vital roles in cellular signaling. While phosphoproteomics allows large-scale identification of phosphosites, determining the catalyzing kinase remains challenging. We present a zero-shot classification benchmark dataset, DARKIN, for assigning phosphosites to one of the understudied kinases (dark kinases). DARKIN provides train, validation, and test folds split based on zero-shot classification, kinase groups, and sequence similarities. Evaluation of pLMs using a novel training-free k-NN-based zero-shot classifier and a bilinear zero-shot classifier reveals superior performance by Esm models, ProtT5-XL, and the recently introduced structure-based SaProt model. We believe this biologically relevant yet challenging benchmark will further facilitate assessing the efficacy of pLMs and aid the exploration of dark kinases.

Cite

Text

Sunar et al. "DARKIN: A Zero-Shot Classification Benchmark and an  Evaluation of Protein Language Models." ICLR 2024 Workshops: MLGenX, 2024.

Markdown

[Sunar et al. "DARKIN: A Zero-Shot Classification Benchmark and an  Evaluation of Protein Language Models." ICLR 2024 Workshops: MLGenX, 2024.](https://mlanthology.org/iclrw/2024/sunar2024iclrw-darkin/)

BibTeX

@inproceedings{sunar2024iclrw-darkin,
  title     = {{DARKIN: A Zero-Shot Classification Benchmark and an  Evaluation of Protein Language Models}},
  author    = {Sunar, Emine Ayşe and Işık, Zeynep and Pekey, Mert and Cinbis, Ramazan Gokberk and Tastan, Oznur},
  booktitle = {ICLR 2024 Workshops: MLGenX},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/sunar2024iclrw-darkin/}
}