Out-of-Distribution Validation for Bioactivity Prediction in Drug Discovery: Lessons from Materials Science

Abstract

Recent advances in machine learning for materials science have significantly improved the prediction of novel materials. Building on these methods, we have adapted them for drug discovery, specifically focusing on assessing performance on out-of-distribution data. We found this approach more effective than conventional cross-validation methods by employing k-fold n-step forward cross-validation (SFCV) for predicting small molecules. Additionally, we introduced two new metrics: discovery yield and novelty error. These metrics provide deeper insights into model applicability and prediction accuracy for drug-like molecules. Based on our findings, we recommend incorporating these metrics into state-of-the-art bioactivity prediction models for drug discovery.

Cite

Text

Saha et al. "Out-of-Distribution Validation for Bioactivity Prediction  in Drug Discovery: Lessons from Materials Science." ICML 2024 Workshops: ML4LMS, 2024.

Markdown

[Saha et al. "Out-of-Distribution Validation for Bioactivity Prediction  in Drug Discovery: Lessons from Materials Science." ICML 2024 Workshops: ML4LMS, 2024.](https://mlanthology.org/icmlw/2024/saha2024icmlw-outofdistribution/)

BibTeX

@inproceedings{saha2024icmlw-outofdistribution,
  title     = {{Out-of-Distribution Validation for Bioactivity Prediction  in Drug Discovery: Lessons from Materials Science}},
  author    = {Saha, Udit Surya and Vendruscolo, Michele and Carpenter, Anne E and Singh, Shantanu and Bender, Andreas and Seal, Srijit},
  booktitle = {ICML 2024 Workshops: ML4LMS},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/saha2024icmlw-outofdistribution/}
}