Correcting Flaws in Common Disentanglement Metrics
Abstract
Disentangled representations are those in which distinct features, such as size or shape, are represented by distinct neurons. Quantifying the extent to which a given representation is disentangled is not straightforward; multiple metrics have been proposed. In this paper, we identify two failings of existing metrics, which mean they can assign a high score to a model which is still entangled, and we propose two new metrics, which redress these problems. First, we use hypothetical toy examples to demonstrate the failure modes we identify for existing metrics. Then, we show that similar situations occur in practice. Finally, we validate our metrics on the downstream task of compositional generalization. We measure the performance of six existing disentanglement models on this downstream compositional generalization task, and show that performance is (a) generally quite poor, (b) correlated, to varying degrees, with most disentanglement metrics, and (c) most strongly correlated with our newly proposed metrics. Anonymous code to reproduce our results is available at https://github.com/anon296/anon.
Cite
Text
Mahon et al. "Correcting Flaws in Common Disentanglement Metrics." Transactions on Machine Learning Research, 2024.Markdown
[Mahon et al. "Correcting Flaws in Common Disentanglement Metrics." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/mahon2024tmlr-correcting/)BibTeX
@article{mahon2024tmlr-correcting,
title = {{Correcting Flaws in Common Disentanglement Metrics}},
author = {Mahon, Louis and Sha, Lei and Lukasiewicz, Thomas},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/mahon2024tmlr-correcting/}
}