Closed-Loop Unsupervised Representation Disentanglement with $\\beta$-VAE Distillation and Diffusion Probabilistic Feedback

Abstract

Representation disentanglement may help AI fundamentally understand the real world and thus benefit both discrimination and generation tasks. It currently has at least three unresolved core issues: (i) heavy reliance on label annotation and synthetic data — causing poor generalization on natural scenarios; (ii) heuristic/hand-craft disentangling constraints make it hard to adaptively achieve an optimal training trade-off; (iii) lacking reasonable evaluation metric, especially for the real label-free data. To address these challenges, we propose a Closed-Loop unsupervised representation Disentanglement approach dubbed CL-Dis. Specifically, we use diffusion-based autoencoder (Diff-AE) as a backbone while resorting to β-VAE as a co-pilot to extract semantically disentangled representations. The strong generation ability of diffusion model and the good disentanglement ability of VAE model are complementary. To strengthen disentangling, VAE-latent distillation and diffusion-wise feedback are interconnected in a closed-loop system for a further mutual promotion. Then, a self-supervised Navigation strategy is introduced to identify interpretable semantic directions in the disentangled latent space. Finally, a new metric based on content tracking is designed to evaluate the disentanglement effect. Experiments demonstrate the superiority of CL-Dis on applications like real image manipulation and visual analysis.

Cite

Text

Jin et al. "Closed-Loop Unsupervised Representation Disentanglement with $\\beta$-VAE Distillation and Diffusion Probabilistic Feedback." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72995-9_16

Markdown

[Jin et al. "Closed-Loop Unsupervised Representation Disentanglement with $\\beta$-VAE Distillation and Diffusion Probabilistic Feedback." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/jin2024eccv-closedloop/) doi:10.1007/978-3-031-72995-9_16

BibTeX

@inproceedings{jin2024eccv-closedloop,
  title     = {{Closed-Loop Unsupervised Representation Disentanglement with $\\beta$-VAE Distillation and Diffusion Probabilistic Feedback}},
  author    = {Jin, Xin and Li, Bohan and Xie, Baao and Zhang, Wenyao and Liu, Jinming and Li, Ziqiang and Yang, Tao and Zeng, Wenjun},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72995-9_16},
  url       = {https://mlanthology.org/eccv/2024/jin2024eccv-closedloop/}
}