An Information Criterion for Controlled Disentanglement of Multimodal Data
Abstract
Multimodal representation learning seeks to relate and decompose information available in multiple modalities. By disentangling modality-specific information from information that is shared across modalities, we can improve interpretability and robustness and enable tasks like counterfactual generation. However, separating these components is challenging due to their deep entanglement in real-world data. We propose $\textbf{Disentangled}$ $\textbf{S}$elf-$\textbf{S}$upervised $\textbf{L}$earning (DisentangledSSL), a novel self-supervised approach that effectively learns disentangled representations, even when the so-called $\textit{Minimum Necessary Information}$ (MNI) point is not achievable. It outperforms baselines on multiple synthetic and real-world datasets, excelling in downstream tasks, including prediction tasks for vision-language data, and molecule-phenotype retrieval for biological data.
Cite
Text
Wang et al. "An Information Criterion for Controlled Disentanglement of Multimodal Data." NeurIPS 2024 Workshops: UniReps, 2024.Markdown
[Wang et al. "An Information Criterion for Controlled Disentanglement of Multimodal Data." NeurIPS 2024 Workshops: UniReps, 2024.](https://mlanthology.org/neuripsw/2024/wang2024neuripsw-information-a/)BibTeX
@inproceedings{wang2024neuripsw-information-a,
title = {{An Information Criterion for Controlled Disentanglement of Multimodal Data}},
author = {Wang, Chenyu and Gupta, Sharut and Zhang, Xinyi and Tonekaboni, Sana and Jegelka, Stefanie and Jaakkola, Tommi and Uhler, Caroline},
booktitle = {NeurIPS 2024 Workshops: UniReps},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/wang2024neuripsw-information-a/}
}