Removing Biases from Molecular Representations via Information Maximization
Abstract
High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweights samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes.
Cite
Text
Wang et al. "Removing Biases from Molecular Representations via Information Maximization." International Conference on Learning Representations, 2024.Markdown
[Wang et al. "Removing Biases from Molecular Representations via Information Maximization." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/wang2024iclr-removing/)BibTeX
@inproceedings{wang2024iclr-removing,
title = {{Removing Biases from Molecular Representations via Information Maximization}},
author = {Wang, Chenyu and Gupta, Sharut and Uhler, Caroline and Jaakkola, Tommi S.},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/wang2024iclr-removing/}
}