Pooling Image Datasets with Multiple Covariate Shift and Imbalance
Abstract
Small sample sizes are common in many disciplines, which necessitates pooling roughly similar datasets across multiple sites/institutions to study weak but relevant associations between images and disease incidence. Such data often manifest shifts and imbalances in covariates (secondary non-imaging data). These issues are well-studied for classical models, but the ideas simply do not apply to overparameterized DNN models. Consequently, recent work has shown how strategies from fairness and invariant representation learning provides a meaningful starting point, but the current repertoire of methods remains limited to accounting for shifts/imbalances in just a couple of covariates at a time. In this paper, we show how viewing this problem from the perspective of Category theory provides a simple and effective solution that completely avoids elaborate multi-stage training pipelines that would otherwise be needed. We show the effectiveness of this approach via extensive experiments on real datasets. Further, we discuss how our style of formulation offers a unified perspective on at least 5+ distinct problem settings in vision, from self-supervised learning to matching problems in 3D reconstruction.
Cite
Text
Chytas et al. "Pooling Image Datasets with Multiple Covariate Shift and Imbalance." International Conference on Learning Representations, 2024.Markdown
[Chytas et al. "Pooling Image Datasets with Multiple Covariate Shift and Imbalance." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/chytas2024iclr-pooling/)BibTeX
@inproceedings{chytas2024iclr-pooling,
title = {{Pooling Image Datasets with Multiple Covariate Shift and Imbalance}},
author = {Chytas, Sotirios Panagiotis and Lokhande, Vishnu Suresh and Singh, Vikas},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/chytas2024iclr-pooling/}
}