Compensation Sampling for Improved Convergence in Diffusion Models

Abstract

Diffusion models achieve remarkable quality in image generation, but at a cost. Iterative denoising requires many time steps to produce high fidelity images. The denoising process is crucially limited by an accumulation of the reconstruction error due to an initial inaccurate reconstruction of the target data. This leads to lower quality outputs, and slower convergence. To address these issues, we propose compensation sampling to guide the generation towards the target domain. We introduce a compensation term, implemented as a U-Net, which adds negligible training overhead. Our approach is flexible and we demonstrate its application in unconditional generation, face inpainting, and face de-occlusion on benchmark datasets CIFAR-10, CelebA, CelebA-HQ, FFHQ-256, and FSG. Our approach consistently yields state-of-the-art results in terms of image quality, while accelerating the denoising process to converge during training by up to an order of magnitude.

Cite

Text

Lu et al. "Compensation Sampling for Improved Convergence in Diffusion Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73030-6_11

Markdown

[Lu et al. "Compensation Sampling for Improved Convergence in Diffusion Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/lu2024eccv-compensation/) doi:10.1007/978-3-031-73030-6_11

BibTeX

@inproceedings{lu2024eccv-compensation,
  title     = {{Compensation Sampling for Improved Convergence in Diffusion Models}},
  author    = {Lu, Hui and Salah, Albert Ali and Poppe, Ronald},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73030-6_11},
  url       = {https://mlanthology.org/eccv/2024/lu2024eccv-compensation/}
}