PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

Abstract

Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create high-affinity binders for arbitrary protein targets on demand, without extensive rounds of wet-lab testing, remains a significant challenge. Here, we introduce PPDiff, a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets in a non-autoregressive manner. PPDiff builds upon our developed Sequence Structure Interleaving Network with Causal attention layers (SSINC), which integrates interleaved self-attention layers to capture global amino acid correlations, $k$-nearest neighbor ($k$NN) equivariant graph convolutional layers to model local interactions in three-dimensional (3D) space, and causal attention layers to simplify the intricate interdependencies within the protein sequence. To assess PPDiff, we curate PPBench, a general protein-protein complex dataset comprising 706,360 complexes from the Protein Data Bank (PDB). The model is pretrained on PPBench and finetuned on two real-world applications: target-protein mini-binder complex design and antigen-antibody complex design. PPDiff consistently surpasses baseline methods, achieving success rates of 50.00%, 23.16%, and 16.89% for the pretraining task and the two downstream applications, respectively.

Cite

Text

Song et al. "PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Song et al. "PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/song2025icml-ppdiff/)

BibTeX

@inproceedings{song2025icml-ppdiff,
  title     = {{PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design}},
  author    = {Song, Zhenqiao and Li, Tianxiao and Li, Lei and Min, Martin Renqiang},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {56319-56336},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/song2025icml-ppdiff/}
}