Understanding Diffusion-Based Representation Learning via Low-Dimensional Modeling
Abstract
This work addresses the critical question of why and when diffusion models, despite their generative design, are capable of learning high-quality representations in a self-supervised manner. We hypothesize that diffusion models excel in representation learning due to their ability to learn the low-dimensional distributions of image datasets via optimizing a noise-controlled denoising objective. Our empirical results support this hypothesis, indicating that variations in the representation learning performance of diffusion models across noise levels are closely linked to the quality of the corresponding posterior estimation. Grounded on this observation, we offer theoretical insights into the unimodal representation dynamics of diffusion models as noise scales vary, demonstrating how they effectively learn meaningful representations through the denoising process. We also highlight the impact of the inherent parameter-sharing mechanism in diffusion models, which accounts for their advantages over traditional denoising auto-encoders in representation learning.
Cite
Text
Li et al. "Understanding Diffusion-Based Representation Learning via Low-Dimensional Modeling." NeurIPS 2024 Workshops: M3L, 2024.Markdown
[Li et al. "Understanding Diffusion-Based Representation Learning via Low-Dimensional Modeling." NeurIPS 2024 Workshops: M3L, 2024.](https://mlanthology.org/neuripsw/2024/li2024neuripsw-understanding/)BibTeX
@inproceedings{li2024neuripsw-understanding,
title = {{Understanding Diffusion-Based Representation Learning via Low-Dimensional Modeling}},
author = {Li, Xiao and Zhang, Zekai and Li, Xiang and Chen, Siyi and Zhu, Zhihui and Wang, Peng and Qu, Qing},
booktitle = {NeurIPS 2024 Workshops: M3L},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/li2024neuripsw-understanding/}
}