Removing Biases from Molecular Representations via Information Maximization

Abstract

High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

Cite

Text

Wang et al. "Removing Biases from Molecular Representations via Information Maximization." NeurIPS 2023 Workshops: AI4D3, 2023.

Markdown

[Wang et al. "Removing Biases from Molecular Representations via Information Maximization." NeurIPS 2023 Workshops: AI4D3, 2023.](https://mlanthology.org/neuripsw/2023/wang2023neuripsw-removing/)

BibTeX

@inproceedings{wang2023neuripsw-removing,
  title     = {{Removing Biases from Molecular Representations via Information Maximization}},
  author    = {Wang, Chenyu and Gupta, Sharut and Uhler, Caroline and Jaakkola, Tommi},
  booktitle = {NeurIPS 2023 Workshops: AI4D3},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/wang2023neuripsw-removing/}
}