PiNUI: A Dataset of Protein-Protein Interactions for Machine Learning

Abstract

We introduce a new novel dataset named PiNUI: Protein Interactions with Nearly Uniform Imbalance. PiNUI is a dataset of Protein-Protein Interactions (PPI) specifically designed for Machine Learning (ML) applications that offer a higher degree of representativeness of real-world PPI tasks compared to existing ML-ready PPI datasets. We achieve such by increasing the data size and quality, and minimizing the sampling bias of negative interactions. We demonstrate that models trained on PiNUI almost always outperform those trained on conventional PPI datasets when evaluated on various general PPI tasks using external test sets.

Cite

Text

Dubourg-Felonneau et al. "PiNUI: A Dataset of Protein-Protein Interactions for Machine Learning." NeurIPS 2023 Workshops: AI4D3, 2023.

Markdown

[Dubourg-Felonneau et al. "PiNUI: A Dataset of Protein-Protein Interactions for Machine Learning." NeurIPS 2023 Workshops: AI4D3, 2023.](https://mlanthology.org/neuripsw/2023/dubourgfelonneau2023neuripsw-pinui/)

BibTeX

@inproceedings{dubourgfelonneau2023neuripsw-pinui,
  title     = {{PiNUI: A Dataset of Protein-Protein Interactions for Machine Learning}},
  author    = {Dubourg-Felonneau, Geoffroy and Akiva, Eyal and Wesego, Daniel and Varadan, Ranjani},
  booktitle = {NeurIPS 2023 Workshops: AI4D3},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/dubourgfelonneau2023neuripsw-pinui/}
}