ProteinShake: Building Datasets and Benchmarks for Deep Learning on Protein Structures
Abstract
We present ProteinShake, a Python software package that simplifies datasetcreation and model evaluation for deep learning on protein structures. Users cancreate custom datasets or load an extensive set of pre-processed datasets fromthe Protein Data Bank (PDB) and AlphaFoldDB. Each dataset is associated withprediction tasks and evaluation functions covering a broad array of biologicalchallenges. A benchmark on these tasks shows that pre-training almost alwaysimproves performance, the optimal data modality (graphs, voxel grids, or pointclouds) is task-dependent, and models struggle to generalize to new structures.ProteinShake makes protein structure data easily accessible and comparisonamong models straightforward, providing challenging benchmark settings withreal-world implications.ProteinShake is available at: https://proteinshake.ai
Cite
Text
Kucera et al. "ProteinShake: Building Datasets and Benchmarks for Deep Learning on Protein Structures." Neural Information Processing Systems, 2023.Markdown
[Kucera et al. "ProteinShake: Building Datasets and Benchmarks for Deep Learning on Protein Structures." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/kucera2023neurips-proteinshake/)BibTeX
@inproceedings{kucera2023neurips-proteinshake,
title = {{ProteinShake: Building Datasets and Benchmarks for Deep Learning on Protein Structures}},
author = {Kucera, Tim and Oliver, Carlos and Chen, Dexiong and Borgwardt, Karsten},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/kucera2023neurips-proteinshake/}
}