PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation
Abstract
Diffusion models have made significant advancements in recent years. However, their performance often deteriorates when trained or fine-tuned on imbalanced datasets. This degradation is largely due to the disproportionate representation of majority and minority data in image-text pairs. In this paper, we propose a general fine-tuning approach, dubbed PoGDiff, to address this challenge. Rather than directly minimizing the KL divergence between the predicted and ground-truth distributions, PoGDiff replaces the ground-truth distribution with a Product of Gaussians (PoG), which is constructed by combining the original ground-truth targets with the predicted distribution conditioned on a neighboring text embedding. Experiments on real-world datasets demonstrate that our method effectively addresses the imbalance problem in diffusion models, improving both generation accuracy and quality.
Cite
Text
Wang et al. "PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation." Advances in Neural Information Processing Systems, 2025.Markdown
[Wang et al. "PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wang2025neurips-pogdiff/)BibTeX
@inproceedings{wang2025neurips-pogdiff,
title = {{PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation}},
author = {Wang, Ziyan and Wei, Sizhe and Huo, Xiaoming and Wang, Hao},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/wang2025neurips-pogdiff/}
}