CARE: A Benchmark Suite for the Classification and Retrieval of Enzymes

Abstract

Enzymes are important proteins that catalyze chemical reactions. In recent years, machine learning methods have emerged to predict enzyme function from sequence; however, there are no standardized benchmarks to evaluate these methods. We introduce CARE, a benchmark and dataset suite for the Classification And Retrieval of Enzymes (CARE). CARE centers on two tasks: (1) classification of a protein sequence by its enzyme commission (EC) number and (2) retrieval of an EC number given a chemical reaction. For each task, we design train-test splits to evaluate different kinds of out-of-distribution generalization that are relevant to real use cases. For the classification task, we provide baselines for state-of-the-art methods. Because the retrieval task has not been previously formalized, we propose a method called Contrastive Reaction-EnzymE Pretraining (CREEP) as one of the first baselines for this task and compare it to the recent method, CLIPZyme. CARE is available at https://github.com/jsunn-y/CARE/.

Cite

Text

Yang et al. "CARE: A Benchmark Suite for the Classification and Retrieval of Enzymes." Neural Information Processing Systems, 2024. doi:10.52202/079017-0101

Markdown

[Yang et al. "CARE: A Benchmark Suite for the Classification and Retrieval of Enzymes." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/yang2024neurips-care/) doi:10.52202/079017-0101

BibTeX

@inproceedings{yang2024neurips-care,
  title     = {{CARE: A Benchmark Suite for the Classification and Retrieval of Enzymes}},
  author    = {Yang, Jason and Mora, Ariane and Liu, Shengchao and Wittmann, Bruce J. and Anandkumar, Anima and Arnold, Frances H. and Yue, Yisong},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0101},
  url       = {https://mlanthology.org/neurips/2024/yang2024neurips-care/}
}