Generalization of Diffusion Models Arises with a Balanced Representation Space
Abstract
Diffusion models excel at generating high-quality, diverse samples, yet they risk memorizing training data when overfit to the training objective. We analyze the distinctions between memorization and generalization in diffusion models through the lens of representation learning. By investigating a two-layer ReLU denoising autoencoder (DAE), we prove that: *(i)* memorization corresponds to the model storing raw training dataset in the learned weights for encoding and decoding, yielding localized, spiky representations; whereas *(ii)* generalization arises when the model captures local data statistics, producing balanced representations. Furthermore, we validate our theoretical findings on real-world unconditional and text-to-image diffusion models, demonstrating that the same representation structures emerge in deep generative models with significant practical implications. Building on these insights, we propose a representation-based method for detecting memorization and a training-free editing technique that allows precise control via representation steering. Together, our results highlight that *learning good representations is central to novel and meaningful generative modelling*. Code is available at https://github.com/la0ka1/diffusion-gen-from-rep.
Cite
Text
Zhang et al. "Generalization of Diffusion Models Arises with a Balanced Representation Space." International Conference on Learning Representations, 2026.Markdown
[Zhang et al. "Generalization of Diffusion Models Arises with a Balanced Representation Space." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-generalization/)BibTeX
@inproceedings{zhang2026iclr-generalization,
title = {{Generalization of Diffusion Models Arises with a Balanced Representation Space}},
author = {Zhang, Zekai and Li, Xiao and Li, Xiang and Shi, Lianghe and Wu, Meng and Tao, Molei and Qu, Qing},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zhang2026iclr-generalization/}
}