CPSea: Large-Scale Cyclic Peptide-Protein Complex Dataset for Machine Learning in Cyclic Peptide Design
Abstract
Cyclic peptides exhibit better binding affinity and proteolytic stability compared to their linear counterparts. However, the development of cyclic peptide design models is hindered by the scarcity of data. To address this, we introduce **CPSea**(**C**yclic **P**eptide **Sea**), a dataset of 2.71 million cyclic peptide-receptor complexes, curated through systematic mining of the AlphaFold Database (AFDB). Our pipeline extracts compact domains from AFDB, identifies cyclization sites using the $\beta$-carbon (C$_\beta$) distance thresholds, and applies multi-stage filtering to ensure structure fidelity and binding compatibility. Compared with experimental data of cyclic peptides, CPSea shows similar distributions in metrics on structure fidelity and wet-lab compatibility. To our knowledge, CPSea is the largest cyclic peptide-receptor dataset to date, enabling end-to-end model training for the first time. The dataset also showcases the feasibility of simulating inter-chain interactions using intra-chain interactions, expanding available resources for machine-learning models on protein-protein interactions. The dataset and relevant scripts are accessible on GitHub ([https://github.com/YZY010418/CPSea](https://github.com/YZY010418/CPSea)).
Cite
Text
Yang et al. "CPSea: Large-Scale Cyclic Peptide-Protein Complex Dataset for Machine Learning in Cyclic Peptide Design." Advances in Neural Information Processing Systems, 2025.Markdown
[Yang et al. "CPSea: Large-Scale Cyclic Peptide-Protein Complex Dataset for Machine Learning in Cyclic Peptide Design." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/yang2025neurips-cpsea/)BibTeX
@inproceedings{yang2025neurips-cpsea,
title = {{CPSea: Large-Scale Cyclic Peptide-Protein Complex Dataset for Machine Learning in Cyclic Peptide Design}},
author = {Yang, Ziyi and Xie, Hanyuan and Jia, Yinjun and Kong, Xiangzhe and Zheng, Jiqing and Zhang, Ziting and Liu, Yang and Liu, Lei and Lan, Yanyan},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/yang2025neurips-cpsea/}
}