Regression Modeling on DNA Encoded Libraries

Abstract

DNA encoded libraries (DELs) are pooled, combinatorial compound collections where each member is tagged with its own unique DNA barcode. DELs are used in drug discovery for early hit finding against protein targets. Recently, several groups have proposed building machine learning models with quantities derived from DEL datasets. However, DEL datasets have a low signal-to-noise ratio which makes modeling them challenging. To that end, we propose a novel graph neural network (GNN) based regression model that directly predicts enrichment scores from raw sequencing counts while accounting for multiple sources of technical variation and intrinsic assay noise. We show that our GNN regression model quantitatively outperforms standard classification approaches and can be used to find diverse sets of molecules in external virtual libraries.

Cite

Text

Ma et al. "Regression Modeling on DNA Encoded Libraries." NeurIPS 2021 Workshops: AI4Science, 2021.

Markdown

[Ma et al. "Regression Modeling on DNA Encoded Libraries." NeurIPS 2021 Workshops: AI4Science, 2021.](https://mlanthology.org/neuripsw/2021/ma2021neuripsw-regression/)

BibTeX

@inproceedings{ma2021neuripsw-regression,
  title     = {{Regression Modeling on DNA Encoded Libraries}},
  author    = {Ma, Ralph and Dreiman, Gabriel H. S. and Ruggiu, Fiorella and Riesselman, Adam Joseph and Liu, Bowen and James, Keith and Sultan, Mohammad and Koller, Daphne},
  booktitle = {NeurIPS 2021 Workshops: AI4Science},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/ma2021neuripsw-regression/}
}